Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

docs: astra component update #6720

Merged
merged 15 commits into from
Feb 24, 2025
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 33 additions & 18 deletions docs/docs/Components/components-vector-stores.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,31 +37,46 @@ For more information, see the [DataStax documentation](https://docs.datastax.com

| Name | Display Name | Info |
|------|--------------|------|
| collection_name | Collection Name | The name of the collection within Astra DB where the vectors will be stored (required) |
| token | Astra DB Application Token | Authentication token for accessing Astra DB (required) |
| api_endpoint | API Endpoint | API endpoint URL for the Astra DB service (required) |
| search_input | Search Input | Query string for similarity search |
| ingest_data | Ingest Data | Data to be ingested into the vector store |
| namespace | Namespace | Optional namespace within Astra DB to use for the collection |
| embedding_choice | Embedding Model or Astra Vectorize | Determines whether to use an Embedding Model or Astra Vectorize for the collection |
| embedding | Embedding Model | Allows an embedding model configuration (when using Embedding Model) |
| provider | Vectorize Provider | Provider for Astra Vectorize (when using Astra Vectorize) |
| metric | Metric | Optional distance metric for vector comparisons |
| batch_size | Batch Size | Optional number of data to process in a single batch |
| setup_mode | Setup Mode | Configuration mode for setting up the vector store (options: "Sync", "Async", "Off", default: "Sync") |
| pre_delete_collection | Pre Delete Collection | Boolean flag to determine whether to delete the collection before creating a new one |
| number_of_results | Number of Results | Number of results to return in similarity search (default: 4) |
| search_type | Search Type | Search type to use (options: "Similarity", "Similarity with score threshold", "MMR (Max Marginal Relevance)") |
| search_score_threshold | Search Score Threshold | Minimum similarity score threshold for search results |
| search_filter | Search Metadata Filter | Optional dictionary of filters to apply to the search query |
| token | Astra DB Application Token | The authentication token for accessing Astra DB (required). |
| environment | Environment | The environment for the Astra DB API Endpoint. For example, `dev` or `prod`. |
| database_name | Database | The database name for the Astra DB instance (required). |
| api_endpoint | Astra DB API Endpoint | The API endpoint for the Astra DB instance. This supersedes the database selection. |
| collection_name | Collection | The name of the collection within Astra DB where the vectors are stored (required). |
| keyspace | Keyspace | An optional keyspace within Astra DB to use for the collection. |
| embedding_choice | Embedding Model or Astra Vectorize | Choose an embedding model or use Astra vectorize. |
| embedding_model | Embedding Model | Specify the embedding model. Not required for Astra vectorize collections. |
| number_of_results | Number of Search Results | Number of search results to return (default: `4`). |
| search_type | Search Type | The search type to use . The options are `Similarity`, `Similarity with score threshold`, and `MMR (Max Marginal Relevance)`. |
| search_score_threshold | Search Score Threshold | The minimum similarity score threshold for search results when using the `Similarity with score threshold` option. |
| advanced_search_filter | Search Metadata Filter | An optional dictionary of filters to apply to the search query. |
| autodetect_collection | Autodetect Collection | A boolean flag to determine whether to autodetect the collection. |
| content_field | Content Field | A field to use as the text content field for the vector store. |
| deletion_field | Deletion Based On Field | When provided, documents in the target collection with metadata field values matching the input metadata field value are deleted before new data is loaded. |
| ignore_invalid_documents | Ignore Invalid Documents | A boolean flag to determine whether to ignore invalid documents at runtime. |
| astradb_vectorstore_kwargs | AstraDBVectorStore Parameters | An optional dictionary of additional parameters for the AstraDBVectorStore. |

### Outputs

| Name | Display Name | Info |
|------|--------------|------|
| vector_store | Vector Store | Astra DB vector store instance configured with the specified parameters. |
| search_results | Search Results | The results of the similarity search as a list of `Data` objects. |
| search_results | Search Results | The results of the similarity search as a list of [Data](/concepts-objects#data-object) objects. |

### Generate embeddings

The **Astra DB Vector Store** component offers two methods for generating embeddings.

1. **Embedding Model**: Use your own embedding model by connecting an [Embeddings](/components-embedding-models) component in Langflow.

2. **Astra Vectorize**: Use Astra DB's built-in embedding generation service. When creating a new collection, choose the embeddings provider and models, including NVIDIA's `NV-Embed-QA` model hosted by Datastax.

:::important
The embedding model selection is made when creating a new collection and cannot be changed later.
:::

For an example of using the **Astra DB Vector Store** component with an embedding model, see the [Vector Store RAG starter project](/starter-projects-vector-store-rag).

For more information, see the [Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html).

## AstraDB Graph vector store

Expand Down
27 changes: 21 additions & 6 deletions docs/docs/Get-Started/get-started-quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ Get to know Langflow by building an OpenAI-powered chatbot application. After yo

* [An OpenAI API key](https://platform.openai.com/)
* [An Astra DB vector database](https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html) with:
* An AstraDB application token
* [A collection in Astra](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection)
* An Astra DB application token scoped to read and write to the database
* A collection created in [Astra](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection) or a new collection created in the **Astra DB** component

## Open Langflow and start a new project

Expand Down Expand Up @@ -130,14 +130,29 @@ The [OpenAI Embeddings](/components-embedding-models#openai-embeddings) componen

![](/img/quickstart-add-document-ingestion.png)

8. Configure the **Astra DB** component.
2. Configure the **Astra DB** component.
1. In the **Astra DB Application Token** field, add your **Astra DB** application token.
The component connects to your database and populates the menus with existing databases and collections.
2. Select your **Database**.
If you don't have a collection, select **New database**.
Complete the **Name**, **Cloud provider**, and **Region** fields, and then click **Create**. **Database creation takes a few minutes**.
3. Select your **Collection**. Collections are created in your [Astra DB deployment](https://astra.datastax.com) for storing vector data.
If you don't have a collection, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection).
4. Select **Embedding Model** to bring your own embeddings model, which is the connected **OpenAI Embeddings** component.
The **Dimensions** value must match the dimensions of your collection. This value can be found in your **Collection** in your [Astra DB deployment](https://astra.datastax.com).
:::info
If you select a collection embedded with Nvidia through Astra's vectorize service, the **Embedding Model** port is removed, because you have already generated embeddings for this collection with the Nvidia `NV-Embed-QA` model. The component fetches the data from the collection, and uses the same embeddings for queries.
:::

3. If you don't have a collection, create a new one within the component.
1. Select **New collection**.
2. Complete the **Name**, **Embedding generation method**, **Embedding model**, and **Dimensions** fields, and then click **Create**.

Your choice for the **Embedding generation method** and **Embedding model** depends on whether you want to use embeddings generated by a provider through Astra's vectorize service, or generated by a component in Langflow.

* To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu.
* To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component.
* The **Dimensions** value must match the dimensions of your collection. This field is **not required** if you use embeddings generated through Astra's vectorize service. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com).

For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html).


If you used Langflow's **Global Variables** feature, the RAG application flow components are already configured with the necessary credentials.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ This opens a starter flow with the necessary components to run an agentic applic

## Simple Agent flow

<img src="/img/starter-flow-simple-agent.png" alt="Starter flow simple agent" width="75%"/>
![Simple agent starter flow](/img/starter-flow-simple-agent.png)

The **Simple Agent** flow consists of these components:

Expand Down
25 changes: 20 additions & 5 deletions docs/docs/Starter-Projects/starter-projects-vector-store-rag.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ We've chosen [Astra DB](https://astra.datastax.com/#?utm_source=langflow-p

* [An OpenAI API key](https://platform.openai.com/)
* [An Astra DB vector database](https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html) with:
* An Astra DB application token
* [A collection in Astra](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection)
* An Astra DB application token scoped to read and write to the database
* A collection created in [Astra](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection) or a new collection created in the **Astra DB** component


## Open Langflow and start a new project
Expand Down Expand Up @@ -60,10 +60,25 @@ The **Retriever Flow** (top of the screen) embeds the user's queries into vecto
1. In the **Astra DB Application Token** field, add your **Astra DB** application token.
The component connects to your database and populates the menus with existing databases and collections.
2. Select your **Database**.
If you don't have a collection, select **New database**.
Complete the **Name**, **Cloud provider**, and **Region** fields, and then click **Create**. **Database creation takes a few minutes**.
3. Select your **Collection**. Collections are created in your [Astra DB deployment](https://astra.datastax.com) for storing vector data.
If you don't have a collection, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection).
4. Select **Embedding Model** to bring your own embeddings model, which is the connected **OpenAI Embeddings** component.
The **Dimensions** value must match the dimensions of your collection. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com).
:::info
If you select a collection embedded with Nvidia through Astra's vectorize service, the **Embedding Model** port is removed, because you have already generated embeddings for this collection with the Nvidia `NV-Embed-QA` model. The component fetches the data from the collection, and uses the same embeddings for queries.
:::

3. If you don't have a collection, create a new one within the component.
1. Select **New collection**.
2. Complete the **Name**, **Embedding generation method**, **Embedding model**, and **Dimensions** fields, and then click **Create**.

Your choice for the **Embedding generation method** and **Embedding model** depends on whether you want to use embeddings generated by a provider through Astra's vectorize service, or generated by a component in Langflow.

* To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu.
* To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component.
* The **Dimensions** value must match the dimensions of your collection. This field is **not required** if you use embeddings generated through Astra's vectorize service. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com).

For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html).


If you used Langflow's **Global Variables** feature, the RAG application flow components are already configured with the necessary credentials.

Expand Down
Loading