Named Entity Recognition - Ritvvij Parrikh Named Entity Recognition | Ritvvij Parrikh Humane ClubMade with Humane Club
Block Pattern: Slim Fit

Named Entity Recognition

  • Named Entity Recognition (NER) Algorithms serve three information needs:
    • When did it happen
    • Who is it about?
    • Where did it happen?
  • There are many open source NER algorithms. Stanza from Stanford NLP group, spaCy, and flair. All three are fairly accurate on test data. Off these, Stanza has fast processing speed and support for different types of entities.

  • Here’s a sample output on Stanza for an article, headlined, Sorry, I won’t fly the Trianga on Aug 15. Here’s why. Stanza identified:
    • People: “Narendra Modi”, “Har Ghar Tiranga”, “Jawaharlal Nehru”, “Tiranga”, “Nehru”, “Tricolour”, “Swiggy”, “Zomato”, “Azadi”, “Modi”, “the Ashoka Chakra”
    • Places: Stanza provides three types of locations: GPE (countries, cities, and states), Non-GPE locations (mountains, rivers), and Facilities (buildings, airports, highways. The ones identified for this article include “Kargil”, “Pakistan”, “Jammu”, “India”, “Kashmir”, “Tiger Hill”, “Hill”.
    • Organizations: “RSS”, “PM Modi”, “Congress”, “Corporate Social Responsibility”, “Parliament”, “Resident Welfare Associations”, “Mint”, “CSR”
    • Legal concepts like “Constitution”, “the Flag Code”, “Code”
    • Products like “Tricolour”, “Tricolor”
    • NORP, i.e., Nationalities or religious or political groups: “Indian”, “Tiranga”, “Indians”
    • Temporal data like: “75 years of Independence”, “The day”, “22 long years ago”, “52 years”, “75”, “the 75th year of Independence”, “August 13 to 15”, “the early days”, “75th”
  • As you can see, this classification by Stanza isn’t how a human would have classified it. There is significant noise. Hence, the NER algorithm requires more refinement. To reduce noise, remove irrelevant entities using DBpedia APIs.
0 results
Sort by
Your search returned 0 results.