This workshop aims to take unprepared data and make it usable with a Retrieval Augementation Generation (RAG) Pattern for a chat bot.
In this workshop, we'll be using Aiven for OpenSearch and LangChain to:
- Chunk transcription data and generate embeddings
- Configure our OpenSearch index for Known Nearest Neighbors (KNN) and perform a similarity search
- Connect our search responses to an Large Language Model (LLM) to generate informed answers using LangChain
- Compare the performance of multiple LLMs
Our instructions and notebooks are in the workshop
folder.
The text and materials for this workshop are licensed under the Apache license, version 2.0. Full license text is available in the LICENSE file.
Please note that the project explicitly does not require a CLA (Contributor License Agreement) from its contributors.
Conduit Podcast Transcripts by Jay Miller, Kathy Campbell, original downloads from whisper work done by Pilix is licensed under Attribution-NonCommercial-ShareAlike 4.0 International
Bug reports and patches are very welcome, please post them as GitHub issues and pull requests at https://github.com/Aiven-Labs/preparing-data-for-opensearch-and-rag
To report any possible vulnerabilities or other serious issues please see our security policy.
Report Code of Conduct issues according to our policy