Skip to content

Latest commit

 

History

History
36 lines (22 loc) · 1.19 KB

README.md

File metadata and controls

36 lines (22 loc) · 1.19 KB

IR Project

Information Retrieval from Semi-Structured Data

Brief

In this project, I have created an IR system for semi-structured data, in this case CSV (comma separated values) data. The data-set contains 8807 records of shows on Netflix - including the show name, director's name, cast, date of release, length/duration, genre and the plot.

The aim is to build a search engine for netflix shows, based on topics learnt throughout the information retrieval course and some concepts of natural language processing.

This project is also an example of domain-specific information retrieval, as while creating the IR system, I took into account the kind of data present in the CSV file and the kinds of queries a user may make.

Usage

First clone the repo and using pip install all the dependencies as mentioned in requirements.txt in a virtual environment.

For development mode, run

./dev

For production mode, run

./prod

These commands will do the following

  1. Clean the raw CSV data
  2. Train the Doc2Vec model
  3. Build Indexes
  4. Run the flask server in dev/prod mode

The web application will be served at http://localhost:5000