From 1f5ebf6c3078f0218963ff0f422cd88b3ae6aacd Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Wed, 19 Feb 2025 14:07:43 -0500 Subject: [PATCH 01/10] starter-project-update --- .../starter-projects-vector-store-rag.md | 20 ++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md b/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md index 9bdecfc6e484..d7738d166813 100644 --- a/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md +++ b/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md @@ -21,8 +21,8 @@ We've chosen [Astra DB](https://astra.datastax.com/signup?utm_source=langflow-p * [An OpenAI API key](https://platform.openai.com/) * [An Astra DB vector database](https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html) with: - * An Astra DB application token - * [A collection in Astra](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection) + * An Astra DB application token scoped to read and write to the database + * A collection created in [Astra](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection) or a new collection created in the **Astra DB** component ## Open Langflow and start a new project @@ -60,10 +60,20 @@ The **Retriever Flow** (top of the screen) embeds the user's queries into vecto 1. In the **Astra DB Application Token** field, add your **Astra DB** application token. The component connects to your database and populates the menus with existing databases and collections. 2. Select your **Database**. + If you don't have a collection, select **New database**. + Complete the **Name**, **Cloud provider**, and **Region** fields, and then click **Create**. **Database creation takes a few minutes**. 3. Select your **Collection**. Collections are created in your [Astra DB deployment](https://astra.datastax.com) for storing vector data. - If you don't have a collection, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection). - 4. Select **Embedding Model** to bring your own embeddings model, which is the connected **OpenAI Embeddings** component. - The **Dimensions** value must match the dimensions of your collection. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). + :::info + If you select a collection embedded with Nvidia through Astra's vectorize service, the **Embedding Model** port is removed, because you have already generated embeddings for this collection with the Nvidia `NV-Embed-QA` model. The component fetches the data from the collection, and uses the same embeddings for queries. + ::: + If you don't have a collection, select **New collection**. + Complete the **Name**, **Embedding generation method**, **Embedding model**, and **Dimensions** fields, and then click **Create**. + * To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu. + * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. + * The **Dimensions** value must match the dimensions of your collection. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). + + For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). + If you used Langflow's **Global Variables** feature, the RAG application flow components are already configured with the necessary credentials. From 013c260d855ca5561d063de5639a1a3b5ad425cc Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Wed, 19 Feb 2025 14:25:02 -0500 Subject: [PATCH 02/10] update-component-add-vectorize --- .../Components/components-vector-stores.md | 49 ++++++++++++------- 1 file changed, 31 insertions(+), 18 deletions(-) diff --git a/docs/docs/Components/components-vector-stores.md b/docs/docs/Components/components-vector-stores.md index c989ddaeb94d..7f3695a6f34d 100644 --- a/docs/docs/Components/components-vector-stores.md +++ b/docs/docs/Components/components-vector-stores.md @@ -37,31 +37,44 @@ For more information, see the [DataStax documentation](https://docs.datastax.com | Name | Display Name | Info | |------|--------------|------| -| collection_name | Collection Name | The name of the collection within Astra DB where the vectors will be stored (required) | -| token | Astra DB Application Token | Authentication token for accessing Astra DB (required) | -| api_endpoint | API Endpoint | API endpoint URL for the Astra DB service (required) | -| search_input | Search Input | Query string for similarity search | -| ingest_data | Ingest Data | Data to be ingested into the vector store | -| namespace | Namespace | Optional namespace within Astra DB to use for the collection | -| embedding_choice | Embedding Model or Astra Vectorize | Determines whether to use an Embedding Model or Astra Vectorize for the collection | -| embedding | Embedding Model | Allows an embedding model configuration (when using Embedding Model) | -| provider | Vectorize Provider | Provider for Astra Vectorize (when using Astra Vectorize) | -| metric | Metric | Optional distance metric for vector comparisons | -| batch_size | Batch Size | Optional number of data to process in a single batch | -| setup_mode | Setup Mode | Configuration mode for setting up the vector store (options: "Sync", "Async", "Off", default: "Sync") | -| pre_delete_collection | Pre Delete Collection | Boolean flag to determine whether to delete the collection before creating a new one | -| number_of_results | Number of Results | Number of results to return in similarity search (default: 4) | -| search_type | Search Type | Search type to use (options: "Similarity", "Similarity with score threshold", "MMR (Max Marginal Relevance)") | -| search_score_threshold | Search Score Threshold | Minimum similarity score threshold for search results | -| search_filter | Search Metadata Filter | Optional dictionary of filters to apply to the search query | +| token | Astra DB Application Token | Authentication token for accessing Astra DB (required). | +| environment | Environment | The environment for the Astra DB API Endpoint. For example, `dev` or `prod`. | +| database_name | Database | The Database name for the Astra DB instance (required). | +| api_endpoint | Astra DB API Endpoint | The API Endpoint for the Astra DB instance. Supercedes database selection. | +| collection_name | Collection | The name of the collection within Astra DB where the vectors will be stored (required). | +| keyspace | Keyspace | Optional keyspace within Astra DB to use for the collection. | +| embedding_choice | Embedding Model or Astra Vectorize | Choose an embedding model or use Astra Vectorize. | +| embedding_model | Embedding Model | Specify the Embedding Model. Not required for Astra Vectorize collections. | +| number_of_results | Number of Search Results | Number of search results to return (default: 4). | +| search_type | Search Type | Search type to use (options: `Similarity`, `Similarity with score threshold`, `MMR (Max Marginal Relevance)`). | +| search_score_threshold | Search Score Threshold | Minimum similarity score threshold for search results (when using 'Similarity with score threshold'). | +| advanced_search_filter | Search Metadata Filter | Optional dictionary of filters to apply to the search query. | +| autodetect_collection | Autodetect Collection | Boolean flag to determine whether to autodetect the collection. | +| content_field | Content Field | Field to use as the text content field for the vector store. | +| deletion_field | Deletion Based On Field | When provided, documents in the target collection with metadata field values matching the input metadata field value will be deleted before new data is loaded. | +| ignore_invalid_documents | Ignore Invalid Documents | Boolean flag to determine whether to ignore invalid documents at runtime. | +| astradb_vectorstore_kwargs | AstraDBVectorStore Parameters | Optional dictionary of additional parameters for the AstraDBVectorStore. | ### Outputs | Name | Display Name | Info | |------|--------------|------| | vector_store | Vector Store | Astra DB vector store instance configured with the specified parameters. | -| search_results | Search Results | The results of the similarity search as a list of `Data` objects. | +| search_results | Search Results | The results of the similarity search as a list of [Data](/concepts-objects#data-object) objects. | + +### Generate embeddings + +The **Astra DB Vector Store** component offers two methods for generating embeddings. + +1. **Embedding Model**: Use your own embedding model by connecting an [Embeddings](/components-embedding-models) component in Langflow. + +2. **Astra Vectorize**: Use Astra DB's built-in embedding generation service. When creating a new collection, choose the embeddings provider and models, including NVIDIA's `NV-Embed-QA` model hosted by Datastax. + +The embedding model selection is made when creating a new collection and cannot be changed later. + +For an example of using the **Astra DB Vector Store** component with an embedding model, see the [Vector Store RAG starter project](/starter-projects-vector-store-rag). +For more information, see the [Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). ## AstraDB Graph vector store From ad187f340f370be0b86450a8b5274ce5635fa225 Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Wed, 19 Feb 2025 14:30:42 -0500 Subject: [PATCH 03/10] update-quickstart --- .../Get-Started/get-started-quickstart.md | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/docs/docs/Get-Started/get-started-quickstart.md b/docs/docs/Get-Started/get-started-quickstart.md index 83602540b3ef..d99176fb6c40 100644 --- a/docs/docs/Get-Started/get-started-quickstart.md +++ b/docs/docs/Get-Started/get-started-quickstart.md @@ -11,8 +11,8 @@ Get to know Langflow by building an OpenAI-powered chatbot application. After yo * [An OpenAI API key](https://platform.openai.com/) * [An Astra DB vector database](https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html) with: - * An AstraDB application token - * [A collection in Astra](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection) + * An Astra DB application token scoped to read and write to the database + * A collection created in [Astra](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection) or a new collection created in the **Astra DB** component ## Open Langflow and start a new project @@ -134,10 +134,19 @@ The [OpenAI Embeddings](/components-embedding-models#openai-embeddings) componen 1. In the **Astra DB Application Token** field, add your **Astra DB** application token. The component connects to your database and populates the menus with existing databases and collections. 2. Select your **Database**. + If you don't have a collection, select **New database**. + Complete the **Name**, **Cloud provider**, and **Region** fields, and then click **Create**. **Database creation takes a few minutes**. 3. Select your **Collection**. Collections are created in your [Astra DB deployment](https://astra.datastax.com) for storing vector data. - If you don't have a collection, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection). - 4. Select **Embedding Model** to bring your own embeddings model, which is the connected **OpenAI Embeddings** component. - The **Dimensions** value must match the dimensions of your collection. This value can be found in your **Collection** in your [Astra DB deployment](https://astra.datastax.com). + :::info + If you select a collection embedded with Nvidia through Astra's vectorize service, the **Embedding Model** port is removed, because you have already generated embeddings for this collection with the Nvidia `NV-Embed-QA` model. The component fetches the data from the collection, and uses the same embeddings for queries. + ::: + If you don't have a collection, select **New collection**. + Complete the **Name**, **Embedding generation method**, **Embedding model**, and **Dimensions** fields, and then click **Create**. + * To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu. + * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. + * The **Dimensions** value must match the dimensions of your collection. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). + + For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). If you used Langflow's **Global Variables** feature, the RAG application flow components are already configured with the necessary credentials. From 49611e60d2c982aee56a819423f886520922c5fc Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Wed, 19 Feb 2025 14:36:45 -0500 Subject: [PATCH 04/10] style-cleanup --- docs/docs/Components/components-vector-stores.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/docs/Components/components-vector-stores.md b/docs/docs/Components/components-vector-stores.md index 7f3695a6f34d..6d7464967cc6 100644 --- a/docs/docs/Components/components-vector-stores.md +++ b/docs/docs/Components/components-vector-stores.md @@ -39,19 +39,19 @@ For more information, see the [DataStax documentation](https://docs.datastax.com |------|--------------|------| | token | Astra DB Application Token | Authentication token for accessing Astra DB (required). | | environment | Environment | The environment for the Astra DB API Endpoint. For example, `dev` or `prod`. | -| database_name | Database | The Database name for the Astra DB instance (required). | -| api_endpoint | Astra DB API Endpoint | The API Endpoint for the Astra DB instance. Supercedes database selection. | -| collection_name | Collection | The name of the collection within Astra DB where the vectors will be stored (required). | +| database_name | Database | The database name for the Astra DB instance (required). | +| api_endpoint | Astra DB API Endpoint | The API endpoint for the Astra DB instance. Supercedes database selection. | +| collection_name | Collection | The name of the collection within Astra DB where the vectors are stored (required). | | keyspace | Keyspace | Optional keyspace within Astra DB to use for the collection. | | embedding_choice | Embedding Model or Astra Vectorize | Choose an embedding model or use Astra Vectorize. | -| embedding_model | Embedding Model | Specify the Embedding Model. Not required for Astra Vectorize collections. | -| number_of_results | Number of Search Results | Number of search results to return (default: 4). | +| embedding_model | Embedding Model | Specify the embedding model. Not required for Astra Vectorize collections. | +| number_of_results | Number of Search Results | Number of search results to return (default: `4`). | | search_type | Search Type | Search type to use (options: `Similarity`, `Similarity with score threshold`, `MMR (Max Marginal Relevance)`). | -| search_score_threshold | Search Score Threshold | Minimum similarity score threshold for search results (when using 'Similarity with score threshold'). | +| search_score_threshold | Search Score Threshold | Minimum similarity score threshold for search results (when using `Similarity with score threshold`). | | advanced_search_filter | Search Metadata Filter | Optional dictionary of filters to apply to the search query. | | autodetect_collection | Autodetect Collection | Boolean flag to determine whether to autodetect the collection. | | content_field | Content Field | Field to use as the text content field for the vector store. | -| deletion_field | Deletion Based On Field | When provided, documents in the target collection with metadata field values matching the input metadata field value will be deleted before new data is loaded. | +| deletion_field | Deletion Based On Field | When provided, documents in the target collection with metadata field values matching the input metadata field value are deleted before new data is loaded. | | ignore_invalid_documents | Ignore Invalid Documents | Boolean flag to determine whether to ignore invalid documents at runtime. | | astradb_vectorstore_kwargs | AstraDBVectorStore Parameters | Optional dictionary of additional parameters for the AstraDBVectorStore. | From 0dc410115d8d0248f9e14e32332c2f7c7813960d Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Fri, 21 Feb 2025 10:36:43 -0500 Subject: [PATCH 05/10] Apply suggestions from code review Co-authored-by: KimberlyFields <46325568+KimberlyFields@users.noreply.github.com> --- .../Components/components-vector-stores.md | 24 +++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/docs/Components/components-vector-stores.md b/docs/docs/Components/components-vector-stores.md index 6d7464967cc6..bcbd8149644f 100644 --- a/docs/docs/Components/components-vector-stores.md +++ b/docs/docs/Components/components-vector-stores.md @@ -37,23 +37,23 @@ For more information, see the [DataStax documentation](https://docs.datastax.com | Name | Display Name | Info | |------|--------------|------| -| token | Astra DB Application Token | Authentication token for accessing Astra DB (required). | +| token | Astra DB Application Token | The authentication token for accessing Astra DB (required). | | environment | Environment | The environment for the Astra DB API Endpoint. For example, `dev` or `prod`. | | database_name | Database | The database name for the Astra DB instance (required). | -| api_endpoint | Astra DB API Endpoint | The API endpoint for the Astra DB instance. Supercedes database selection. | +| api_endpoint | Astra DB API Endpoint | The API endpoint for the Astra DB instance. This supersedes the database selection. | | collection_name | Collection | The name of the collection within Astra DB where the vectors are stored (required). | -| keyspace | Keyspace | Optional keyspace within Astra DB to use for the collection. | -| embedding_choice | Embedding Model or Astra Vectorize | Choose an embedding model or use Astra Vectorize. | -| embedding_model | Embedding Model | Specify the embedding model. Not required for Astra Vectorize collections. | +| keyspace | Keyspace | An optional keyspace within Astra DB to use for the collection. | +| embedding_choice | Embedding Model or Astra Vectorize | Choose an embedding model or use Astra vectorize. | +| embedding_model | Embedding Model | Specify the embedding model. Not required for Astra vectorize collections. | | number_of_results | Number of Search Results | Number of search results to return (default: `4`). | -| search_type | Search Type | Search type to use (options: `Similarity`, `Similarity with score threshold`, `MMR (Max Marginal Relevance)`). | -| search_score_threshold | Search Score Threshold | Minimum similarity score threshold for search results (when using `Similarity with score threshold`). | -| advanced_search_filter | Search Metadata Filter | Optional dictionary of filters to apply to the search query. | -| autodetect_collection | Autodetect Collection | Boolean flag to determine whether to autodetect the collection. | -| content_field | Content Field | Field to use as the text content field for the vector store. | +| search_type | Search Type | The search type to use . The options are `Similarity`, `Similarity with score threshold`, and `MMR (Max Marginal Relevance)`. | +| search_score_threshold | Search Score Threshold | The minimum similarity score threshold for search results when using the `Similarity with score threshold` option. | +| advanced_search_filter | Search Metadata Filter | An optional dictionary of filters to apply to the search query. | +| autodetect_collection | Autodetect Collection | A boolean flag to determine whether to autodetect the collection. | +| content_field | Content Field | A field to use as the text content field for the vector store. | | deletion_field | Deletion Based On Field | When provided, documents in the target collection with metadata field values matching the input metadata field value are deleted before new data is loaded. | -| ignore_invalid_documents | Ignore Invalid Documents | Boolean flag to determine whether to ignore invalid documents at runtime. | -| astradb_vectorstore_kwargs | AstraDBVectorStore Parameters | Optional dictionary of additional parameters for the AstraDBVectorStore. | +| ignore_invalid_documents | Ignore Invalid Documents | A boolean flag to determine whether to ignore invalid documents at runtime. | +| astradb_vectorstore_kwargs | AstraDBVectorStore Parameters | An optional dictionary of additional parameters for the AstraDBVectorStore. | ### Outputs From 4322965547c5c40259ec591539191d75ebcc31d0 Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Fri, 21 Feb 2025 11:17:17 -0500 Subject: [PATCH 06/10] split-large-steps-add-admonition --- .../Components/components-vector-stores.md | 2 ++ .../Get-Started/get-started-quickstart.md | 20 ++++++++++++------- .../starter-projects-simple-agent.md | 2 +- .../starter-projects-vector-store-rag.md | 17 ++++++++++------ 4 files changed, 27 insertions(+), 14 deletions(-) diff --git a/docs/docs/Components/components-vector-stores.md b/docs/docs/Components/components-vector-stores.md index bcbd8149644f..5be6b023e89f 100644 --- a/docs/docs/Components/components-vector-stores.md +++ b/docs/docs/Components/components-vector-stores.md @@ -70,7 +70,9 @@ The **Astra DB Vector Store** component offers two methods for generating embedd 2. **Astra Vectorize**: Use Astra DB's built-in embedding generation service. When creating a new collection, choose the embeddings provider and models, including NVIDIA's `NV-Embed-QA` model hosted by Datastax. +:::important The embedding model selection is made when creating a new collection and cannot be changed later. +::: For an example of using the **Astra DB Vector Store** component with an embedding model, see the [Vector Store RAG starter project](/starter-projects-vector-store-rag). diff --git a/docs/docs/Get-Started/get-started-quickstart.md b/docs/docs/Get-Started/get-started-quickstart.md index d99176fb6c40..fb19899a5b98 100644 --- a/docs/docs/Get-Started/get-started-quickstart.md +++ b/docs/docs/Get-Started/get-started-quickstart.md @@ -130,7 +130,7 @@ The [OpenAI Embeddings](/components-embedding-models#openai-embeddings) componen ![](/img/quickstart-add-document-ingestion.png) -8. Configure the **Astra DB** component. +2. Configure the **Astra DB** component. 1. In the **Astra DB Application Token** field, add your **Astra DB** application token. The component connects to your database and populates the menus with existing databases and collections. 2. Select your **Database**. @@ -140,13 +140,19 @@ The [OpenAI Embeddings](/components-embedding-models#openai-embeddings) componen :::info If you select a collection embedded with Nvidia through Astra's vectorize service, the **Embedding Model** port is removed, because you have already generated embeddings for this collection with the Nvidia `NV-Embed-QA` model. The component fetches the data from the collection, and uses the same embeddings for queries. ::: - If you don't have a collection, select **New collection**. - Complete the **Name**, **Embedding generation method**, **Embedding model**, and **Dimensions** fields, and then click **Create**. - * To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu. - * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. - * The **Dimensions** value must match the dimensions of your collection. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). - For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). +3. If you don't have a collection, create a new one within the component. + 1. Select **New collection**. + 2. Complete the **Name**, **Embedding generation method**, **Embedding model**, and **Dimensions** fields, and then click **Create**. + + Your choice for the **Embedding generation method** and **Embedding model** depends on whether you want to use embeddings generated by a provider through Astra's vectorize service, or generated by a component in Langflow. + + * To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu. + * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. + * The **Dimensions** value must match the dimensions of your collection. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). + + For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). + If you used Langflow's **Global Variables** feature, the RAG application flow components are already configured with the necessary credentials. diff --git a/docs/docs/Starter-Projects/starter-projects-simple-agent.md b/docs/docs/Starter-Projects/starter-projects-simple-agent.md index 2f580711c3fa..e78650742130 100644 --- a/docs/docs/Starter-Projects/starter-projects-simple-agent.md +++ b/docs/docs/Starter-Projects/starter-projects-simple-agent.md @@ -22,7 +22,7 @@ This opens a starter flow with the necessary components to run an agentic applic ## Simple Agent flow -Starter flow simple agent +![Simple agent starter flow](/img/starter-flow-simple-agent.png) The **Simple Agent** flow consists of these components: diff --git a/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md b/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md index d7738d166813..2573dea9efca 100644 --- a/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md +++ b/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md @@ -66,13 +66,18 @@ The **Retriever Flow** (top of the screen) embeds the user's queries into vecto :::info If you select a collection embedded with Nvidia through Astra's vectorize service, the **Embedding Model** port is removed, because you have already generated embeddings for this collection with the Nvidia `NV-Embed-QA` model. The component fetches the data from the collection, and uses the same embeddings for queries. ::: - If you don't have a collection, select **New collection**. - Complete the **Name**, **Embedding generation method**, **Embedding model**, and **Dimensions** fields, and then click **Create**. - * To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu. - * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. - * The **Dimensions** value must match the dimensions of your collection. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). - For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). +3. If you don't have a collection, create a new one within the component. + 1. Select **New collection**. + 2. Complete the **Name**, **Embedding generation method**, **Embedding model**, and **Dimensions** fields, and then click **Create**. + + Your choice for the **Embedding generation method** and **Embedding model** depends on whether you want to use embeddings generated by a provider through Astra's vectorize service, or generated by a component in Langflow. + + * To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu. + * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. + * The **Dimensions** value must match the dimensions of your collection. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). + + For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). If you used Langflow's **Global Variables** feature, the RAG application flow components are already configured with the necessary credentials. From 0adc1ddeafcf17ab8de5e371758911c0baff5748 Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Fri, 21 Feb 2025 11:20:48 -0500 Subject: [PATCH 07/10] dimensions-not-required-for-astra-vectorize --- docs/docs/Get-Started/get-started-quickstart.md | 2 +- docs/docs/Starter-Projects/starter-projects-vector-store-rag.md | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/docs/Get-Started/get-started-quickstart.md b/docs/docs/Get-Started/get-started-quickstart.md index fb19899a5b98..c7abf3de270e 100644 --- a/docs/docs/Get-Started/get-started-quickstart.md +++ b/docs/docs/Get-Started/get-started-quickstart.md @@ -149,7 +149,7 @@ The [OpenAI Embeddings](/components-embedding-models#openai-embeddings) componen * To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu. * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. - * The **Dimensions** value must match the dimensions of your collection. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). + * The **Dimensions** value must match the dimensions of your collection. This field is **not required** if you use embeddings generated through Astra's vectorize service. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). diff --git a/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md b/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md index 2573dea9efca..d658bc846ca2 100644 --- a/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md +++ b/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md @@ -75,7 +75,7 @@ The **Retriever Flow** (top of the screen) embeds the user's queries into vecto * To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu. * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. - * The **Dimensions** value must match the dimensions of your collection. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). + * The **Dimensions** value must match the dimensions of your collection. This field is **not required** if you use embeddings generated through Astra's vectorize service. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). From 2a2ca9838ef7faa72be3bc97dc0e62472651bc18 Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Fri, 21 Feb 2025 12:02:38 -0500 Subject: [PATCH 08/10] Apply suggestions from code review Co-authored-by: KimberlyFields <46325568+KimberlyFields@users.noreply.github.com> --- docs/docs/Components/components-vector-stores.md | 4 ++-- docs/docs/Get-Started/get-started-quickstart.md | 2 +- .../Starter-Projects/starter-projects-vector-store-rag.md | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/docs/Components/components-vector-stores.md b/docs/docs/Components/components-vector-stores.md index 5be6b023e89f..90bf95e08fa1 100644 --- a/docs/docs/Components/components-vector-stores.md +++ b/docs/docs/Components/components-vector-stores.md @@ -45,8 +45,8 @@ For more information, see the [DataStax documentation](https://docs.datastax.com | keyspace | Keyspace | An optional keyspace within Astra DB to use for the collection. | | embedding_choice | Embedding Model or Astra Vectorize | Choose an embedding model or use Astra vectorize. | | embedding_model | Embedding Model | Specify the embedding model. Not required for Astra vectorize collections. | -| number_of_results | Number of Search Results | Number of search results to return (default: `4`). | -| search_type | Search Type | The search type to use . The options are `Similarity`, `Similarity with score threshold`, and `MMR (Max Marginal Relevance)`. | +| number_of_results | Number of Search Results | The number of search results to return (default: `4`). | +| search_type | Search Type | The search type to use. The options are `Similarity`, `Similarity with score threshold`, and `MMR (Max Marginal Relevance)`. | | search_score_threshold | Search Score Threshold | The minimum similarity score threshold for search results when using the `Similarity with score threshold` option. | | advanced_search_filter | Search Metadata Filter | An optional dictionary of filters to apply to the search query. | | autodetect_collection | Autodetect Collection | A boolean flag to determine whether to autodetect the collection. | diff --git a/docs/docs/Get-Started/get-started-quickstart.md b/docs/docs/Get-Started/get-started-quickstart.md index c7abf3de270e..c2fc396995fc 100644 --- a/docs/docs/Get-Started/get-started-quickstart.md +++ b/docs/docs/Get-Started/get-started-quickstart.md @@ -148,7 +148,7 @@ The [OpenAI Embeddings](/components-embedding-models#openai-embeddings) componen Your choice for the **Embedding generation method** and **Embedding model** depends on whether you want to use embeddings generated by a provider through Astra's vectorize service, or generated by a component in Langflow. * To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu. - * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. + * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the option for the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. * The **Dimensions** value must match the dimensions of your collection. This field is **not required** if you use embeddings generated through Astra's vectorize service. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). diff --git a/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md b/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md index d658bc846ca2..2d6c320561ef 100644 --- a/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md +++ b/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md @@ -74,7 +74,7 @@ The **Retriever Flow** (top of the screen) embeds the user's queries into vecto Your choice for the **Embedding generation method** and **Embedding model** depends on whether you want to use embeddings generated by a provider through Astra's vectorize service, or generated by a component in Langflow. * To use embeddings generated by a provider through Astra's vectorize service, select the model from the **Embedding generation method** dropdown menu, and then select the model from the **Embedding model** dropdown menu. - * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. + * To use embeddings generated by a component in Langflow, select **Bring your own** for both the **Embedding generation method** and **Embedding model** fields. In this starter project, the option for the embeddings method and model is the **OpenAI Embeddings** component connected to the **Astra DB** component. * The **Dimensions** value must match the dimensions of your collection. This field is **not required** if you use embeddings generated through Astra's vectorize service. You can find this value in the **Collection** in your [Astra DB deployment](https://astra.datastax.com). For more information, see the [DataStax Astra DB Serverless documentation](https://docs.datastax.com/en/astra-db-serverless/databases/embedding-generation.html). From e4bc1d9fdb996c4caae7f0a57f6665c58bbb5d6a Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Fri, 21 Feb 2025 12:13:43 -0500 Subject: [PATCH 09/10] fix-numbering --- .../Components/components-vector-stores.md | 8 +++---- .../Get-Started/get-started-quickstart.md | 22 +++++++++---------- 2 files changed, 15 insertions(+), 15 deletions(-) diff --git a/docs/docs/Components/components-vector-stores.md b/docs/docs/Components/components-vector-stores.md index 5be6b023e89f..df4a9fcd2893 100644 --- a/docs/docs/Components/components-vector-stores.md +++ b/docs/docs/Components/components-vector-stores.md @@ -37,15 +37,15 @@ For more information, see the [DataStax documentation](https://docs.datastax.com | Name | Display Name | Info | |------|--------------|------| -| token | Astra DB Application Token | The authentication token for accessing Astra DB (required). | +| token | Astra DB Application Token | The authentication token for accessing Astra DB. | | environment | Environment | The environment for the Astra DB API Endpoint. For example, `dev` or `prod`. | -| database_name | Database | The database name for the Astra DB instance (required). | +| database_name | Database | The database name for the Astra DB instance. | | api_endpoint | Astra DB API Endpoint | The API endpoint for the Astra DB instance. This supersedes the database selection. | -| collection_name | Collection | The name of the collection within Astra DB where the vectors are stored (required). | +| collection_name | Collection | The name of the collection within Astra DB where the vectors are stored. | | keyspace | Keyspace | An optional keyspace within Astra DB to use for the collection. | | embedding_choice | Embedding Model or Astra Vectorize | Choose an embedding model or use Astra vectorize. | | embedding_model | Embedding Model | Specify the embedding model. Not required for Astra vectorize collections. | -| number_of_results | Number of Search Results | Number of search results to return (default: `4`). | +| number_of_results | Number of Search Results | The number of search results to return. Default: `4`. | | search_type | Search Type | The search type to use . The options are `Similarity`, `Similarity with score threshold`, and `MMR (Max Marginal Relevance)`. | | search_score_threshold | Search Score Threshold | The minimum similarity score threshold for search results when using the `Similarity with score threshold` option. | | advanced_search_filter | Search Metadata Filter | An optional dictionary of filters to apply to the search query. | diff --git a/docs/docs/Get-Started/get-started-quickstart.md b/docs/docs/Get-Started/get-started-quickstart.md index c7abf3de270e..1d2f3482023e 100644 --- a/docs/docs/Get-Started/get-started-quickstart.md +++ b/docs/docs/Get-Started/get-started-quickstart.md @@ -31,7 +31,7 @@ Continue to [Run the basic prompting flow](#run-basic-prompting-flow). The Basic Prompting flow will look like this when it's completed: -![](/img/starter-flow-basic-prompting.png) +![Completed basic prompting flow](/img/starter-flow-basic-prompting.png) To build the **Basic Prompting** flow, follow these steps: @@ -46,7 +46,7 @@ The [OpenAI](components-models#openai) model component sends the user input and You should now have a flow that looks like this: -![](/img/quickstart-basic-prompt-no-connections.png) +![Basic prompting flow with no connections](/img/quickstart-basic-prompt-no-connections.png) With no connections between them, the components won't interact with each other. You want data to flow from **Chat Input** to **Chat Output** through the connections between the components. @@ -111,7 +111,7 @@ If you don't want to create a blank flow, click **New Flow**, and then select ** Adding vector RAG to the basic prompting flow will look like this when completed: -![](/img/quickstart-add-document-ingestion.png) +![Add document ingestion to the basic prompting flow](/img/quickstart-add-document-ingestion.png) To build the flow, follow these steps: @@ -120,17 +120,17 @@ To build the flow, follow these steps: The [Astra DB vector store](/components-vector-stores#astra-db-vector-store) component connects to your **Astra DB** database. 3. Click **Data**, select the **File** component, and then drag it to the canvas. The [File](/components-data#file) component loads files from your local machine. -3. Click **Processing**, select the **Split Text** component, and then drag it to the canvas. +4. Click **Processing**, select the **Split Text** component, and then drag it to the canvas. The [Split Text](/components-processing#split-text) component splits the loaded text into smaller chunks. -4. Click **Processing**, select the **Parse Data** component, and then drag it to the canvas. +5. Click **Processing**, select the **Parse Data** component, and then drag it to the canvas. The [Data to Message](/components-processing#data-to-message) component converts the data from the **Astra DB** component into plain text. -5. Click **Embeddings**, select the **OpenAI Embeddings** component, and then drag it to the canvas. +6. Click **Embeddings**, select the **OpenAI Embeddings** component, and then drag it to the canvas. The [OpenAI Embeddings](/components-embedding-models#openai-embeddings) component generates embeddings for the user's input, which are compared to the vector data in the database. -6. Connect the new components into the existing flow, so your flow looks like this: +7. Connect the new components into the existing flow, so your flow looks like this: -![](/img/quickstart-add-document-ingestion.png) +![Add document ingestion to the basic prompting flow](/img/quickstart-add-document-ingestion.png) -2. Configure the **Astra DB** component. +8. Configure the **Astra DB** component. 1. In the **Astra DB Application Token** field, add your **Astra DB** application token. The component connects to your database and populates the menus with existing databases and collections. 2. Select your **Database**. @@ -138,10 +138,10 @@ The [OpenAI Embeddings](/components-embedding-models#openai-embeddings) componen Complete the **Name**, **Cloud provider**, and **Region** fields, and then click **Create**. **Database creation takes a few minutes**. 3. Select your **Collection**. Collections are created in your [Astra DB deployment](https://astra.datastax.com) for storing vector data. :::info - If you select a collection embedded with Nvidia through Astra's vectorize service, the **Embedding Model** port is removed, because you have already generated embeddings for this collection with the Nvidia `NV-Embed-QA` model. The component fetches the data from the collection, and uses the same embeddings for queries. + If you select a collection embedded with NVIDIA through Astra's vectorize service, the **Embedding Model** port is removed, because you have already generated embeddings for this collection with the NVIDIA `NV-Embed-QA` model. The component fetches the data from the collection, and uses the same embeddings for queries. ::: -3. If you don't have a collection, create a new one within the component. +9. If you don't have a collection, create a new one within the component. 1. Select **New collection**. 2. Complete the **Name**, **Embedding generation method**, **Embedding model**, and **Dimensions** fields, and then click **Create**. From e306cecb5c724256dacbd75ccb9f565b058d573d Mon Sep 17 00:00:00 2001 From: Mendon Kissling <59585235+mendonk@users.noreply.github.com> Date: Mon, 24 Feb 2025 11:30:42 -0500 Subject: [PATCH 10/10] Apply suggestions from code review Co-authored-by: KimberlyFields <46325568+KimberlyFields@users.noreply.github.com> --- docs/docs/Starter-Projects/starter-projects-vector-store-rag.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md b/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md index 2d6c320561ef..a833bed06b62 100644 --- a/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md +++ b/docs/docs/Starter-Projects/starter-projects-vector-store-rag.md @@ -20,7 +20,7 @@ We've chosen [Astra DB](https://astra.datastax.com/signup?utm_source=langflow-p ## Prerequisites * [An OpenAI API key](https://platform.openai.com/) -* [An Astra DB vector database](https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html) with: +* [An Astra DB vector database](https://docs.datastax.com/en/astra-db-serverless/get-started/quickstart.html) with the following: * An Astra DB application token scoped to read and write to the database * A collection created in [Astra](https://docs.datastax.com/en/astra-db-serverless/databases/manage-collections.html#create-collection) or a new collection created in the **Astra DB** component