GCP-based Retrieval-Augmented Generation (RAG) System

Welcome to the GCP-based Retrieval-Augmented Generation (RAG) System repository. This project leverages Google Cloud Platform (GCP) to build a scalable RAG system for handling large amounts of data. The data, originating from various formats and conditions, undergoes preprocessing before being ingested into a GCP Datastore. The system uses the Gemini API for data search and summary, with a user interface built using Streamlit.

Overview

This project involves several key steps:

Data Preprocessing: Convert and format data files from various formats (doc, pdf) to a consistent format.
Local Database Creation: Build a local version of the company's database.
Data Ingestion: Sequentially process the files and make necessary format changes.
Cloud Storage: Store the processed data in GCP Cloud Buckets.
Datastore Creation: Use GCP Console to create a scalable Datastore, serving as the vector database.
API Integration: Utilize the Gemini API for data search and summary generation.
User Interface: Implement a Streamlit-based UI for interaction.

Installation

Clone the repository:

git clone https://github.com/your-username/your-repository.git
cd your-repository

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the dependencies:
```
pip install -r requirements.txt
```
Install Google Cloud SDK:

Follow the instructions to install the Google Cloud SDK.

Usage

Preprocess and Ingest Data:
- Modify and run the scripts in the Doc_ingestion folder to preprocess and format your data.
Upload Data to GCP:
- Store the processed data in GCP Cloud Buckets.
Create Datastore:
- Use the GCP Console to create a Datastore for your project. Remember to replace ProjectID, Location, and Datastore with your project-specific details.
Run the Application:
- Use Streamlit to launch the UI and interact with your data.
```
streamlit run main.py
```

Notes

Replace ProjectID, Location, and Datastore with your specific project details when setting up the GCP components.
Ensure all dependencies are installed using the requirements.txt file.
Google Cloud SDK must be installed and authenticated for proper GCP interaction.

For any further questions or issues, feel free to open an issue on this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Doc_ingestion		Doc_ingestion
.gitattributes		.gitattributes
Dockerfile		Dockerfile
Gemini_api.py		Gemini_api.py
README.md		README.md
app.py		app.py
download_buttons.py		download_buttons.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GCP-based Retrieval-Augmented Generation (RAG) System

Table of Contents

Overview

Installation

Usage

Notes

About

Releases

Packages

Languages

EthanFord888/RAG-with-GCP

Folders and files

Latest commit

History

Repository files navigation

GCP-based Retrieval-Augmented Generation (RAG) System

Table of Contents

Overview

Installation

Usage

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages