Skip to content

Implement a RAG chatbot with GCP and Streamlit

Notifications You must be signed in to change notification settings

EthanFord888/RAG-with-GCP

 
 

Repository files navigation

GCP-based Retrieval-Augmented Generation (RAG) System

Welcome to the GCP-based Retrieval-Augmented Generation (RAG) System repository. This project leverages Google Cloud Platform (GCP) to build a scalable RAG system for handling large amounts of data. The data, originating from various formats and conditions, undergoes preprocessing before being ingested into a GCP Datastore. The system uses the Gemini API for data search and summary, with a user interface built using Streamlit.

Table of Contents

Overview

This project involves several key steps:

  1. Data Preprocessing: Convert and format data files from various formats (doc, pdf) to a consistent format.
  2. Local Database Creation: Build a local version of the company's database.
  3. Data Ingestion: Sequentially process the files and make necessary format changes.
  4. Cloud Storage: Store the processed data in GCP Cloud Buckets.
  5. Datastore Creation: Use GCP Console to create a scalable Datastore, serving as the vector database.
  6. API Integration: Utilize the Gemini API for data search and summary generation.
  7. User Interface: Implement a Streamlit-based UI for interaction.

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/your-repository.git
    cd your-repository
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows use `venv\Scripts\activate`
  3. Install the dependencies:

    pip install -r requirements.txt
  4. Install Google Cloud SDK:

    Follow the instructions to install the Google Cloud SDK.

Usage

  1. Preprocess and Ingest Data:

    • Modify and run the scripts in the Doc_ingestion folder to preprocess and format your data.
  2. Upload Data to GCP:

    • Store the processed data in GCP Cloud Buckets.
  3. Create Datastore:

    • Use the GCP Console to create a Datastore for your project. Remember to replace ProjectID, Location, and Datastore with your project-specific details.
  4. Run the Application:

    • Use Streamlit to launch the UI and interact with your data.
    streamlit run main.py

Notes

  • Replace ProjectID, Location, and Datastore with your specific project details when setting up the GCP components.
  • Ensure all dependencies are installed using the requirements.txt file.
  • Google Cloud SDK must be installed and authenticated for proper GCP interaction.

For any further questions or issues, feel free to open an issue on this repository.

About

Implement a RAG chatbot with GCP and Streamlit

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.2%
  • Dockerfile 1.8%