Quizzes Semantic Search

This app uses a local Milvus Lite vector store and a SentenceTransformers embedding model to perform semantic search on a corpus of quiz documents with a known/expected format.

Setting Up to Run the App

Create and activate a virtual environment:

python -m venv venv

source venv/bin/activate
Install dependencies into virtual environment, from requirements.txt:

pip install -r requirements.txt

Running the App

With the virtual environment activated, from the project root, run the following command:

python src/main.py
Follow the prompts. You'll have an opportunity to use an existing vector store or to create a new one. If creating a new store, see the data/samples/input_format.json file for the expected document input structure.
The results of your query will be outputted to the specified location. See data/samples/output_format.json for the structure of the output you can expect.

Note: If creating a new store in Step 2, expect the encoding and inserting process to take ~1 second per 1,500-1,750 tokens on a modern CPU, or ~20 minutes for a 2M token corpus, which was the corpus size used in testing the app. Using a GPU or different embedding models may improve performance, though the current model performed as well as several others in testing. Once the store is created, querying is quite fast.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data/samples		data/samples
src		src
.gitignore		.gitignore
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quizzes Semantic Search

Setting Up to Run the App

Running the App

About

Releases

Packages

Languages

ghyman-oreilly/quiz-corpus-encoding-and-semantic-search

Folders and files

Latest commit

History

Repository files navigation

Quizzes Semantic Search

Setting Up to Run the App

Running the App

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages