IR Project

Information Retrieval from Semi-Structured Data

Brief

In this project, I have created an IR system for semi-structured data, in this case CSV (comma separated values) data. The data-set contains 8807 records of shows on Netflix - including the show name, director's name, cast, date of release, length/duration, genre and the plot.

The aim is to build a search engine for netflix shows, based on topics learnt throughout the information retrieval course and some concepts of natural language processing.

This project is also an example of domain-specific information retrieval, as while creating the IR system, I took into account the kind of data present in the CSV file and the kinds of queries a user may make.

Usage

First clone the repo and using pip install all the dependencies as mentioned in requirements.txt in a virtual environment.

For development mode, run

./dev

For production mode, run

./prod

These commands will do the following

Clean the raw CSV data
Train the Doc2Vec model
Build Indexes
Run the flask server in dev/prod mode

The web application will be served at http://localhost:5000

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
lib		lib
scripts		scripts
server		server
.gitignore		.gitignore
README.md		README.md
app.py		app.py
dev		dev
main.py		main.py
prod		prod
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IR Project

Brief

Usage

About

Releases

Packages

Languages

siddharthborderwala/netflix-search-engine

Folders and files

Latest commit

History

Repository files navigation

IR Project

Brief

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages