Search and information retrieval is a challenging problem. With the proliferation of vector search tools in the market, focus has heavily shifted towards SEO and marketing wins, rather than fundamental quality.
The Retrieval Optimizer from Redis focuses on measuring and improving retrieval quality. This framework helps determine optimal embedding models, retrieval strategies, and index configurations for your specific data and use case.
-
Make sure you have the following tools available:
-
Clone the repository:
git clone https://github.com/redis-applied-ai/retrieval-optimizer.git cd retrieval-optimizer
The retrieval optimizer requires two sets of data to run an optimization study.
The core knowledge base of data to be embedded in Redis. Think of these as your "chunks".
Expected Format:
[
{
"text": "example content",
"item_id": "abc:123"
}
]
Labeled ground truth data for generating the metrics that we will compared between samples.
Expected Format:
[
{
"query": "How long have sea turtles existed on Earth?",
"relevant_item_ids": ["abc:1", "def:54", "hij:42"]
}
]
Under the hood, the item_id
is used to test if a vector query found the desired results (chunks) therefore this identifier needs to be unique to the text provided as input.
Important
The next section covers how to create this set of input data but if you already have them available you can skip.
Follow along with examples/getting_started/populate_index.ipynb to see an end-to-end example of data prep for retrieval optimization.
This guide will walk you through:
- chunking source data
- exporting that data to a format for use with the optimizer
- creating vector representations of the data
- loading them into a vector index
Sometimes you have a pre-defined dataset of queries and expected matches. However, this is NOT always the case. We built a simple web GUI to help.
Assuming you have created data and populated an initial vector index with that data you can run the labeling app for a more convenient experience.
- First set up a fresh environment file:
cp label_app/.env.template label_app/.env
- Update the
.env
file (below is an example):
REDIS_URL=<Redis connection url>
LABELED_DATA_PATH=<file location for exported output>
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
SCHEMA_PATH=schema/index_schema.yaml
# Corresponding fields to return from index see label_app/main.py for implementation
ID_FIELD_NAME=unique id of a chunk or any item stored in vector index
CHUNK_FIELD_NAME=text content
- Environment variable options:
Variable | Example Value | Description | Required |
---|---|---|---|
REDIS_URL | redis://localhost:6379 | Redis connection URL | Yes |
LABELED_DATA_PATH | label_app/data/labeled.json | File path where labeled data will be exported | Yes |
EMBEDDING_MODEL | sentence-transformers/all-MiniLM-L6-v2 | Name of the embedding model to use | Yes |
SCHEMA_PATH | schema/index_schema.yaml | Path to the index schema configuration | Yes |
ID_FIELD_NAME | item_id | Field name containing unique identifier in index | Yes |
CHUNK_FIELD_NAME | text | Field name containing text content in index | Yes |
- Run the data labeling app
docker compose up
This will serve the data labeling app at localhost:8000/label
.
You can also interact with the swagger docs at localhost:8000/docs
.
The data labeling app will connect to the index specified in whatever file was provided as part of the SCHEMA_PATH
environment variable. By default this is label_app/schema/index_schema.yaml if it connects properly you will see the name of the index and the number of documents it has indexed.
From here you can start making queries against your index, label the relevant chunks, and export to a JSON file for use in the optimization. This also a good way to test what's happening with your vector retrieval.
With your data now prepared, you can run optimization studies. A study has a config with defined params and ranges to test and compare with your data.
Check out the following step by step notebooks for running the optimization process:
- Getting started: examples/getting_started/retrieval_optimizer.ipynb
- Adding custom retrieval examples/gettting_started/custom_retriever_optimizer.ipynb
The study config looks like this (see ex_study_config.yaml as an example):
# path to data files for easy read
raw_data_path: "label_app/data/2008-mazda3-chunks.json"
input_data_type: "json"
labeled_data_path: "label_app/data/mazda_labeled_items.json"
# metrics to be used in objective function
metric_weights:
f1_at_k: 1
embedding_latency: 1
total_indexing_time: 1
# constraints for the optimization
n_trials: 10
n_jobs: 1
ret_k: [1, 10] # potential range of value to be sampled during study
ef_runtime: [10, 50]
ef_construction: [100, 300]
m: [8, 64]
# embedding models to be used
embedding_models:
- provider: "hf"
model: "sentence-transformers/all-MiniLM-L6-v2"
dim: 384
- provider: "hf"
model: "intfloat/e5-large-v2"
dim: 1024
Variable | Example Value | Description | Required |
---|---|---|---|
raw_data_path | label_app/data/2008-mazda3-chunks.json |
Path to raw data file | ✅ |
labeled_data_path | label_app/data/mazda-labeled-rewritten.json |
Path to labeled data file | ✅ |
algorithms | flat, hnsw | Indexing algorithms to be tested in optimization | ✅ |
vector_data_types | float32, float16 | Data types to be tested for vectors | ✅ |
n_trials | 15 | Number of optimization trials | ✅ |
n_jobs | 1 | Number of parallel jobs | ✅ |
ret_k | [1, 10] | Range of values to be tested for k in retrieval |
✅ |
embedding_models | Provider: hf Model: sentence-transformers/all-MiniLM-L6-v2 Dim: 384 |
List of embedding models and their dimensions | ✅ |
metric_weights | f1_at_k: 1 embedding_latency: 1 total_indexing_time: 1 |
Weight for respective metric used in the objective function | defaults to example |
input_data_type | json | Type of input data | defaults to example |
redis_url | redis://localhost:6379 |
Connection string for redis instance | defaults to example |
ef_runtime | [10, 20, 30, 50] | Max top candidates during search for HNSW | defaults to example |
ef_construction | [100, 150, 200, 250, 300] | Max number of connected neighbors to consider during graph building for HNSW | defaults to example |
m | [8, 16, 64] | Max number of outgoing edges for each node in graph per layer for HNSW | defaults to example |
poetry install
poetry run study --config optimize/ex_study_config.yaml
This framework implements a fairly common pattern for optimizing hyper-parameters called Bayesian Optimization using Optuna. Bayesian Optimization works by building a probabilistic model (typically Gaussian Processes) of the objective function and iteratively selecting the most promising configurations to evaluate. Unlike grid or random search, Bayesian Optimization balances exploration (trying new regions of the parameter space) and exploitation (focusing on promising areas), efficiently finding optimal hyper-parameters with fewer evaluations. This is particularly useful for expensive-to-evaluate functions, such as training machine learning models. By guiding the search using prior knowledge and updating beliefs based on observed performance, Bayesian Optimization can significantly improve both accuracy and efficiency in hyperparameter tuning.
In our case, we want to maximize the precision and recall of our vector search system while balancing performance tradeoffs such as embedding and indexing latency. Bayesian optimization gives us an automated way of testing all the knobs at our disposal to see which ones best optimize retrieval.