IR Project

Information Retrieval from Semi-Structured Data

Brief

In this project, I have created an IR system for semi-structured data, in this case CSV (comma separated values) data. The data-set contains 8807 records of shows on Netflix - including the show name, director's name, cast, date of release, length/duration, genre and the plot.

The aim is to build a search engine for netflix shows, based on topics learnt throughout the information retrieval course and some concepts of natural language processing.

This project is also an example of domain-specific information retrieval, as while creating the IR system, I took into account the kind of data present in the CSV file and the kinds of queries a user may make.

Usage

First clone the repo and using pip install all the dependencies as mentioned in requirements.txt in a virtual environment.

For development mode, run

./dev

For production mode, run

./prod

These commands will do the following

Clean the raw CSV data
Train the Doc2Vec model
Build Indexes
Run the flask server in dev/prod mode

The web application will be served at http://localhost:5000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

IR Project

Brief

Usage

Files

README.md

Latest commit

History

README.md

File metadata and controls

IR Project

Brief

Usage