Named Entity Recognition (NER) Algorithms serve three information needs:
When did it happen
Who is it about?
Where did it happen?
There are many open source NER algorithms. Stanza from Stanford NLP group, spaCy, and flair. All three are fairly accurate on test data. Off these, Stanza has fast processing speed and support for different types of entities.
Places: Stanza provides three types of locations: GPE (countries, cities, and states), Non-GPE locations (mountains, rivers), and Facilities (buildings, airports, highways. The ones identified for this article include “Kargil”, “Pakistan”, “Jammu”, “India”, “Kashmir”, “Tiger Hill”, “Hill”.
Legal concepts like “Constitution”, “the Flag Code”, “Code”
Products like “Tricolour”, “Tricolor”
NORP, i.e., Nationalities or religious or political groups: “Indian”, “Tiranga”, “Indians”
Temporal data like: “75 years of Independence”, “The day”, “22 long years ago”, “52 years”, “75”, “the 75th year of Independence”, “August 13 to 15”, “the early days”, “75th”
As you can see, this classification by Stanza isn’t how a human would have classified it. There is significant noise. Hence, the NER algorithm requires more refinement. To reduce noise, remove irrelevant entities using DBpedia APIs.