Chinese proverb: Make a mistake and learn a lesson
Similar to unsupervised learning, it does not need labeled data. The model learns from interactions (trial and error)and hence takes time to perfect. Initially the algorithm acts randomly, exploring the environment, but it learns with time by being rewarded when it makes the right choices.
It is best suited for use cases that require achieving a goal, for example, autonomous vehicles, headline testing or personalizing a website.