This is a simple program for Named Entity Recognition (NER) in Java. The objective of the code is to parse a given sentence and come up with all the possible combinations of the entities.
Input can be a sentence of any size that contains named entities,say, Remember the Titans was a movie directed by Boaz Yakin.
The output of the NER would be all the possible combinations of the entities in the sentence:
- {Remember the Titans,Movie} was {a movie,Movie} directed by {Boaz Yakin,director}
- {Remember the Titans,Movie} was a movie directed by Boaz Yakin
- {Remember the Titans,Movie} was {a movie,Movie} directed by Boaz Yakin
- {Remember the Titans,Movie} was a movie directed by {Boaz Yakin,director}
- Remember the Titans was {a movie,Movie} directed by Boaz Yakin
- Remember the Titans was {a movie,Movie} directed by {Boaz Yakin,director}
- Remember the Titans was a movie directed by {Boaz Yakin,director}
- Remember the {the titans,Movie,Sports Team} was {a movie,Movie} directed by {Boaz Yakin,director}
- Remember the {the titans,Movie,Sports Team} was a movie directed by Boaz Yakin
- Remember the {the titans,Movie,Sports Team} was {a movie,Movie} directed by Boaz Yakin
- Remember the {the titans,Movie,Sports Team} was a movie directed by {Boaz Yakin,director}
When the project was kicked off one of the several different approaches that came up was to keep a lookup table for all the know connector words like articles and conjunctions, remove them from the words list after splitting the sentence on the basis of the spaces. This would leave out the Name Entities in the sentence.A lookup is then done for these identified entities on another lookup table that associates them to the entity type.
The entity lookup table here would contain the following data:
- Remember the Titans=>Movie
- a movie=>Movie
- Boaz Yakin=>director
- the Titans=>Movie
- the Titans=>Sports Team
Another alternative logic that was put forward was to build a crude sentence tree that would contain the connector words in the lookup table as parent nodes and do a lookup in the entity table for the leaf node that might contain the entities.This is the current logic followed to implement the NER here.