Skip to content

Experimenting with text pre-processing, indexing, and boolean search techniques

Notifications You must be signed in to change notification settings

johnsonj561/IR-Boolean-Search

Repository files navigation

Boolean Search Application

A simple boolean search application that indexes documents as an inverted index with term frequencies

  • documents are read from collection directory
  • minimal text pre-processing is applied to each document (lowercase, remove whitespace, remove punctuation)
  • documents are tokenized, constructing a term list for each document (term, frequency)
  • inverted index is created such that inverted index includes list of all terms found in collection
    • each term points to a postings list, a list of documents that contain the term and the frequency of the term in given document
  • enters prompt loop, requesting search term from user and providing results until terminates

About

Experimenting with text pre-processing, indexing, and boolean search techniques

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published