This project is Document Search using Vector Embeddings and Vector Data Base using LLM.
Framework Used 1.) LangChain ==> To Connect to LLM 2.) Stream-lit ==> For developing UI
Models Used
OpenAI HuggingFace
Embeddings Model
**text-embedding-ada-002
all-MiniLM-L6-v2**
OpenAI
**gpt-turbo-3.5**
Vector DataBase
Chroma
This project mainly deals with 4 stages 1.) Stage-1 ==> Validation of API Key ==> Here we are providing leverage to the user to either use OpenSource Models from Hugging Face or OpenAI models using OpenAI API Key 2.) Stage-2 ==> Based on the User Selection of Model, He needs to select the documents of any type for which the embeddings need to be performed and the embeddings will be stored in the vector database 3.) Stage-3 ==> Once the vectors have been stored, the user needs to query the keywords on which the similar documents will be fetched from the database 4.) Stage-4 ==> We also provide, the feasibility to the user, to test the knowledge by using RAG(pattern) (Retrieval Augmented Generation)
Stage-1
In the Left side Pane, we are providing the user either to select the OpenAI or Hugging Face. Based on the selection, the text box below and the button will vary, either to provide an OpenAI API key or the HuggingFace API key If an OpenAI key is provided, then the OpenAI key will be validated, if the API key is valid then the Right Pane Controls will be enabled If the Hugging Face key is provided, then the Hugging Face will be validated, if the Hugging Face key is valid then the Right Pane Controls will be enabled The below screenshot is for reference
Stage-2 In the Right side Pane, we allow the user to browse the files of different file types like (pptx,pdf,csv,txt) Once the files are selected, the respective file names will be populated below the control, Once the documents have been uploaded, then we need to embed the documents and store the embeddings in the database
We allow the user to embed the documents. Once the documents have been embedded the below message will be shown
Stage-3 In the Retrieve Document Similarity Page, the below UI will be shown
Here we are providing the user to enter the query so that based on the query, the Top 3 similar documents will be fetched. Here we are using Cosine Similarity
Once the user enters the query, the below UI will be for the reference, with the retrieved documents
Stage-4 In the Query RAG Page, here we are providing the user with the required details he needs Based on this project we can retrieve the documents that are similar based on the user-entered index and also the able to answer the questions user asks