Chat with PDFs

This repository contains a streamlit application that allows a user to chat with their PDF documents. The user can ask a question about the document's contents in natural language, and the application will provide a response based on the content. A language model is used to generate answers.

Architecture

The application follows these steps to provide responses to your questions:

Text Extraction: Extracts text from the uploaded PDF documents.
Text Chunking: The extracted text is divided into manageable chunks.
LLM: The app uses a language model to generate vector representations - embeddings of the text chunks.
Semantic Matching: When the user asks a question, the app compares it with the text chunks and identifies the semantically similar chunks.
Response: The similar chunks are passed to the language model, which generates a response based on the content of the PDFs.

The model used in this project is google's flan-t5-xxl

Requirements

langchain
PyPDF2
python-dotenv
streamlit
faiss-cpu
altair
tiktoken
huggingface-hub
InstructorEmbedding
sentence-transformers
torch
torchvision
setuptools

pip install langchain PyPDF2 python-dotenv streamlit faiss-cpu altair tiktoken huggingface-hub InstructorEmbedding sentence-transformers torch torchvision setuptools

Obtain an API Token from HuggingFace and add it to the .env file in your directory

HUGGINGFACEHUB_API_TOKEN = your_api_key

Usage

Install the required dependencies.
Run app.py to start the streamlit application, using

streamlit run app.py

Upload your PDF documents using the file uploader.
Click the Process button to process the uploaded documents.
Ask questions through the chat.

Note

This project uses Instructor to generate text embeddings and runs locally on your machine. The processing time will vary based on your hardware.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
app.py		app.py
chat_arch.svg		chat_arch.svg
demo1.png		demo1.png
demo2.png		demo2.png
htmlTemp.py		htmlTemp.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chat with PDFs

Contents

Architecture

Requirements

Usage

Note

About

Releases

Packages

Languages

alvisxp/chatwithpdf

Folders and files

Latest commit

History

Repository files navigation

Chat with PDFs

Contents

Architecture

Requirements

Usage

Note

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages