ir-course-uoi

The project for the Information Retrieval course @cse.uoi.gr is about implementing a search engine for Wikipedia articles using Apache Lucene.

In ir-course-uoi-data, you can find the implementation of a custom crawler and HTML preprocessor to extract text from the HTML pages scrapped.

This search engine supports multiple features. For example:

Keyword/Phrase/Wildcard/Boolean Queries
Spelling correction
Searching in specific article sections (i.e. Title, Content, Multimedia, Quotes, References)
Displaying a short description/summary for each result, including highlighting of search terms
Sorting based on Relevance, Publication Date or Modification Date (Ascending/Descending)
Narrowing of results based on time of last update or time of publication (Anytime, day, week, 1/3/6/12 months)

Screenshots

For the license statements of 3rd party software, please refer to lucene-8.5.1 and javafx-sdk-11.0.2.

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.settings		.settings
javafx-sdk-11.0.2		javafx-sdk-11.0.2
lucene-8.5.1		lucene-8.5.1
screenshots		screenshots
src/com/gzachos/ir		src/com/gzachos/ir
.classpath		.classpath
.gitignore		.gitignore
.project		.project
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml