Assignment 3 in TDT4117 Information Retrieval at NTNU.
This repository presents tools for indexing and querying paragraphs from the book 'An Inquiry into the Nature and Causes of the Wealth of Nations' by Adam Smith using the Gensim library.
- Document Partitioning: Splits the book into individual paragraphs, each saved as separate files.
- Text Preprocessing: Utilizes NLTK for tokenization, stemming, and other advanced preprocessing techniques.
- Indexing and Querying: Indexes the paragraphs and query them to retrieve the most relevant content based on the LSI (over TF-IDF) model.
- Visualization: Plots a frequency distribution graph of the top 15 words after preprocessing.