TF-IDF Search Engine

Description:

The following code implements a term frequency inverse document frequency search engine for a corpus of documents to query on. The documents in the corpus will be ranked by their normalized tf-idf scores and the most relevant document to the query will be returned. The following weighting scheme is used: ltc.lnc

Document: logarithmic tf, logarithmic idf, cosine normalization
Query: logarithmic tf, no idf, cosine normalization

Features:

Normalized TF-IDF scores for queries and documents
Finds document name and score most relevant to query
Tokenization, stop word removal, stemming

Usage

Clone this repo locally
Install and update relevant libraries
Identify corpus directory and update 'corpusroot'
Use the provided functions to perform document searches based on a query

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ex_corpus_inaugural_addresses		ex_corpus_inaugural_addresses
.gitignore		.gitignore
README.md		README.md
search_functions.py		search_functions.py
tf_idf_main.py		tf_idf_main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TF-IDF Search Engine

Description:

Features:

Usage

About

Languages

alessandra-rodriguez/doc-search-engine

Folders and files

Latest commit

History

Repository files navigation

TF-IDF Search Engine

Description:

Features:

Usage

About

Topics

Resources

Stars

Watchers

Forks

Languages