Skip to content

siddharthborderwala/netflix-search-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

IR Project

Information Retrieval from Semi-Structured Data

Brief

In this project, I have created an IR system for semi-structured data, in this case CSV (comma separated values) data. The data-set contains 8807 records of shows on Netflix - including the show name, director's name, cast, date of release, length/duration, genre and the plot.

The aim is to build a search engine for netflix shows, based on topics learnt throughout the information retrieval course and some concepts of natural language processing.

This project is also an example of domain-specific information retrieval, as while creating the IR system, I took into account the kind of data present in the CSV file and the kinds of queries a user may make.

Usage

First clone the repo and using pip install all the dependencies as mentioned in requirements.txt in a virtual environment.

For development mode, run

./dev

For production mode, run

./prod

These commands will do the following

  1. Clean the raw CSV data
  2. Train the Doc2Vec model
  3. Build Indexes
  4. Run the flask server in dev/prod mode

The web application will be served at http://localhost:5000

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published