A PDF-based chatbot leveraging OpenVINO and RAG techniques for efficient question-answering, developed as part of the Intel Unnati Industrial Training 2024.
This project demonstrates how to create a chatbot that can answer questions related to a given PDF document using the Retrieval Augmented Generation (RAG) technique. The chatbot is implemented in Google Colab and uses various libraries including OpenVINO for efficient inference.
The project consists of a single notebook that performs the following tasks:
- Reads and processes a PDF file
- Generates a vector store from the PDF content
- Uses a Language Model (LLM) to answer questions based on the vector store
Component | Description |
---|---|
PDF Processing | Extracts text from a PDF file using PyPDF2 |
Vector Store Generation | Creates a FAISS index from text chunks using sentence-transformers |
LLM Integration | Uses TinyLlama model for generating responses |
OpenVINO Optimization | Leverages Intel's OpenVINO toolkit for optimized model inference |
User Interface | Implements a Gradio interface for easy interaction |
- Open the notebook in Google Colab.
- Run all cells in order.
- Upload a PDF file when prompted.
- Ask questions about the PDF content using the Gradio interface.
Before running the notebook, you need to install the required packages and import the necessary libraries. Run the following commands in a code cell:
# Install required packages
!pip install -q transformers sentence-transformers faiss-cpu PyPDF2 openvino-nightly
!pip install -q optimum[openvino]
!pip install numpy PyPDF2 sentence-transformers faiss-cpu optimum[intel] transformers nltk gradio
# Import required libraries
import numpy as np
import PyPDF2
from sentence_transformers import SentenceTransformer
import faiss
from transformers import AutoTokenizer
from optimum.intel import OVModelForCausalLM
import gc
import torch
import nltk
import gradio as gr
import tempfile
import os
# Download NLTK data
nltk.download('punkt', quiet=True)
- PDF Processing: The notebook reads the uploaded PDF and extracts its text content.
- Semantic Chunking: The extracted text is divided into semantic chunks for better context preservation.
- Vector Store Creation: Chunks are embedded using a sentence transformer model and stored in a FAISS index for quick retrieval.
- Question Answering: When a user asks a question, the system:
- Finds the most relevant chunks using semantic similarity
- Constructs a prompt with the question and relevant context
- Generates an answer using the TinyLlama model
The notebook uses default configurations, but you can modify the following:
- Chunk size and overlap in the
create_semantic_chunks
function - Number of relevant chunks retrieved (k) in the
chatbot
function - Model used for embeddings and language generation
This project leverages Intel's OpenVINO toolkit for optimized inference:
- The TinyLlama model is loaded and exported using
OVModelForCausalLM
from theoptimum.intel
package. - This allows for hardware-specific optimizations, potentially improving inference speed and efficiency, especially on Intel hardware.
- OpenVINO optimizations are particularly beneficial for larger models or high-volume query processing.
- Google Colab environment (or local setup with similar specifications)
- Internet connection for downloading models and libraries
- For optimal performance with OpenVINO, Intel CPU is recommended
- The use of OpenVINO optimizations may significantly improve performance, especially on Intel hardware
- Performance benefits may be more noticeable with larger models or when processing many queries
- The TinyLlama model is relatively small, which allows for quick responses but may limit the complexity of answers
- If CUDA out of memory errors is encountered, try restarting the runtime or using a CPU-only version.
- Ensure all required libraries are correctly installed. Check the error message for missing packages.
- Performance depends on the quality and length of the uploaded PDF.
- Uses a small language model (TinyLlama) which may limit response quality for complex queries.
- Support for multiple PDF uploads
- Integration with more powerful language models
- Implementation of conversation history and context awareness
- Fine-tuning options for specific domains
This project was developed by a team of 5 members as part of the Intel Unnati Industrial Training 2024:
- Swetakshi Nanda: Project lead, architecture design
- Pratyush Pahari: LLM integration and OpenVINO optimization
- Arpan Bag: PDF Processing, Embedding generation
- Akashdeep Mitra: User interface development and integration
- Tulika Chakraborty: Documentation of the complete Project
We would like to thank our mentor Abhishek Nandy and the Intel Unnati program for their guidance and support throughout this project.
Contributions to improve the chatbot are welcome. Please feel free to fork the repository and submit a Pull Request or open an issue to discuss about your changes.
This project uses several open-source libraries and models: