Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug]: Milvus can't load / create BM25 collection (Node.js) #40581

Open
1 task done
luisdanielbarros opened this issue Mar 11, 2025 · 6 comments
Open
1 task done

[Bug]: Milvus can't load / create BM25 collection (Node.js) #40581

luisdanielbarros opened this issue Mar 11, 2025 · 6 comments
Assignees
Labels
component/rootcoord kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@luisdanielbarros
Copy link

luisdanielbarros commented Mar 11, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.5.4
- Deployment mode(standalone or cluster): Standalone
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2): @zilliz/milvus2-sdk-node ^2.5.5
- OS(Ubuntu or CentOS): Windows 10
- CPU/Memory: Intel(R) Core(TM) i5-14600KF / 32GB
- GPU: NVIDIA GeForce RTX 4070 SUPER
- Others:

Current Behavior

When following the documentation at https://milvus.io/docs/full-text-search.md

I'm unable to create a collection using the following code

  const collection = await client.createCollection({
    collection_name: job.data.collectionName,
    schema: getMilvusSchema(dim),
    metric_type: "COSINE",
    index_params: milvusBM25IndexParams,
    functions: milvusBM25Functions,
  });


import { DataType, FunctionType } from "@zilliz/milvus2-sdk-node";

// define schema
export const milvus_bot_collectionName = `bot`;
export const getMilvusSchema = (dim: number) => [
  {
    name: `id`,
    data_type: DataType.VarChar,
    max_length: 128,
    is_primary_key: true,
    autoID: false,
  },
  {
    name: `embeddings`,
    data_type: DataType.FloatVector,
    dim: 768,
  },
  {
    name: `sparseEmbeddings`,
    data_type: DataType.SparseFloatVector,
  },
  {
    name: `ownerType`,
    data_type: DataType.VarChar,
    max_length: 128,
  },
  {
    name: `ownerId`,
    data_type: DataType.VarChar,
    max_length: 128,
  },
  {
    name: `documentId`,
    data_type: DataType.VarChar,
    max_length: 65535,
  },
  {
    name: `documentType`,
    data_type: DataType.VarChar,
    max_length: 65535,
  },
  {
    name: `document`,
    data_type: DataType.VarChar,
    max_length: 65535,
    enable_analyzer: true,
    enable_match: true,
  },
  {
    name: `metadata`,
    data_type: DataType.VarChar,
    max_length: 65535,
  },
];

export const milvusBM25Functions = [
  {
    name: "text_bm25_emb",
    description: "bm25 function",
    type: FunctionType.BM25,
    input_field_names: ["document"],
    output_field_names: ["sparseEmbeddings"],
    params: {},
  },
];

export const milvusBM25IndexParams = [
  {
    field_name: "sparseEmbeddings",
    metric_type: "BM25",
    index_type: "AUTOINDEX",
  },
];

The ExpressJS server running it simply stops, not running any line after, Milvus shows some errors but nothing I managed to Google a solution for

It's my first time creating a bug report outside a company's private GitHub repos, hope everything is okay.

Expected Behavior

The expected behavior would be having the ExpressJS server progress past this line of code, creating a queryable collection according to the documentation link provided.

Steps To Reproduce

1. Install the environment as specified (I'm using Docker for Milvus)
2. Try running the following code, even in an empty ExpressJS server

Milvus Log

2025-03-11 15:29:58 milvus-standalone | [2025/03/11 15:29:58.909 +00:00] [INFO] [datacoord/segment_manager.go:573] ["try remove empty sealed segment after channel cp updated failed to get channel"] [channel=by-dev-rootcoord-dml_8_456573303766329480v0]
2025-03-11 15:30:00 milvus-standalone | [2025/03/11 15:30:00.126 +00:00] [INFO] [dist/dist_handler.go:114] ["pull and handle distribution done"] [respSize=14] [pullDur=574.377µs] [handleDur=6.941µs]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.272 +00:00] [WARN] [rootcoord/meta_table.go:584] ["collection not available"] [collectionID=456573303766334478] [state=CollectionDropping]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.272 +00:00] [WARN] [rootcoord/meta_table.go:584] ["collection not available"] [collectionID=456573303766334478] [state=CollectionDropping]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.272 +00:00] [WARN] [rootcoord/root_coord.go:1267] ["failed to describe collection"] [traceID=70d4a431bd3a3eef27e9e614250d56e7] [collectionName=Knowledgebase_67d050b64614a04387c670b0] [dbName=default] [id=0] [ts=18446744073709551615] [allowUnavailable=false] [error="collection not found[collection=456573303766334478]"]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.272 +00:00] [INFO] [proxy/impl.go:678] ["DropCollection received"] [traceID=0fb3ba4b7132c0b989adf2d5a4abc233] [role=proxy] [db=default] [collection=Knowledgebase_67d050b64614a04387c670b0]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.272 +00:00] [INFO] [rootcoord/root_coord.go:1091] ["received request to drop collection"] [traceID=0fb3ba4b7132c0b989adf2d5a4abc233] [role=rootcoord] [dbName=default] [name=Knowledgebase_67d050b64614a04387c670b0]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.272 +00:00] [WARN] [rootcoord/meta_table.go:584] ["collection not available"] [collectionID=456573303766334478] [state=CollectionDropping]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.272 +00:00] [WARN] [rootcoord/meta_table.go:584] ["collection not available"] [collectionID=456573303766334478] [state=CollectionDropping]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.272 +00:00] [WARN] [rootcoord/drop_collection_task.go:63] ["drop non-existent collection"] [traceID=0fb3ba4b7132c0b989adf2d5a4abc233] [collection=Knowledgebase_67d050b64614a04387c670b0] [database=default]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.272 +00:00] [INFO] [rootcoord/root_coord.go:1124] ["done to drop collection"] [traceID=0fb3ba4b7132c0b989adf2d5a4abc233] [role=rootcoord] [name=Knowledgebase_67d050b64614a04387c670b0] [ts=456578041705136134]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.273 +00:00] [INFO] [proxy/impl.go:704] ["DropCollection done"] [traceID=0fb3ba4b7132c0b989adf2d5a4abc233] [role=proxy] [db=default] [collection=Knowledgebase_67d050b64614a04387c670b0] [BeginTs=456578041705136133] [EndTs=456578041705136133]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.837 +00:00] [WARN] [rootcoord/meta_table.go:584] ["collection not available"] [collectionID=456573303766334478] [state=CollectionDropping]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.837 +00:00] [WARN] [rootcoord/meta_table.go:584] ["collection not available"] [collectionID=456573303766334478] [state=CollectionDropping]
2025-03-11 15:30:07 milvus-standalone | [2025/03/11 15:30:07.837 +00:00] [WARN] [rootcoord/root_coord.go:1267] ["failed to describe collection"] [traceID=d87d9f40e469ef1647d589a9d9ab554e] [collectionName=Knowledgebase_67d050b64614a04387c670b0] [dbName=default] [id=0] [ts=18446744073709551615] [allowUnavailable=false] [error="collection not found[collection=456573303766334478]"]
2025-03-11 15:30:08 milvus-standalone | [2025/03/11 15:30:08.605 +00:00] [WARN] [rootcoord/meta_table.go:584] ["collection not available"] [collectionID=456573303766334478] [state=CollectionDropping]
2025-03-11 15:30:08 milvus-standalone | [2025/03/11 15:30:08.605 +00:00] [WARN] [rootcoord/meta_table.go:584] ["collection not available"] [collectionID=456573303766334478] [state=CollectionDropping]
2025-03-11 15:30:08 milvus-standalone | [2025/03/11 15:30:08.605 +00:00] [WARN] [rootcoord/root_coord.go:1267] ["failed to describe collection"] [traceID=85962da783159d4db1d229b2682ef332] [collectionName=Knowledgebase_67d050b64614a04387c670b0] [dbName=default] [id=0] [ts=18446744073709551615] [allowUnavailable=false] [error="collection not found[collection=456573303766334478]"]
2025-03-11 15:30:08 milvus-standalone | [2025/03/11 15:30:08.606 +00:00] [WARN] [rootcoord/meta_table.go:584] ["collection not available"] [collectionID=456573303766334478] [state=CollectionDropping]
2025-03-11 15:30:08 milvus-standalone | [2025/03/11 15:30:08.606 +00:00] [WARN] [rootcoord/meta_table.go:584] ["collection not available"] [collectionID=456573303766334478] [state=CollectionDropping]
2025-03-11 15:30:08 milvus-standalone | [2025/03/11 15:30:08.606 +00:00] [WARN] [rootcoord/root_coord.go:1267] ["failed to describe collection"] [traceID=004f90beed4fef28a3c3cc741b2fe76c] [collectionName=Knowledgebase_67d050b64614a04387c670b0] [dbName=default] [id=0] [ts=18446744073709551615] [allowUnavailable=false] [error="collection not found[collection=456573303766334478]"]
2025-03-11 15:30:08 milvus-standalone | [2025/03/11 15:30:08.606 +00:00] [WARN] [proxy/task_scheduler.go:480] ["Failed to execute task: "] [traceID=004f90beed4fef28a3c3cc741b2fe76c] [error="collection not found[collection=456573303766334478]"]
2025-03-11 15:30:08 milvus-standalone | [2025/03/11 15:30:08.606 +00:00] [WARN] [proxy/impl.go:864] ["LoadCollection failed to WaitToFinish"] [traceID=004f90beed4fef28a3c3cc741b2fe76c] [role=proxy] [db=default] [collection=Knowledgebase_67d050b64614a04387c670b0] [refreshMode=false] [error="collection not found[collection=456573303766334478]"] [BeginTS=456578042059030530] [EndTS=456578042059030530]
2025-03-11 15:30:11 milvus-standalone | [2025/03/11 15:30:11.258 +00:00] [DEBUG] [metrics/thread.go:53] ["thread watcher observe thread num"] [threadNum=137]

Anything else?

No response

@luisdanielbarros luisdanielbarros added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 11, 2025
Copy link

dosubot bot commented Mar 11, 2025

Hey there, @luisdanielbarros! I'm here to help you with any bugs, questions, or contributions you might have. Let's tackle this together!

I found a similar closed issue related to collection errors in Milvus: Bug: After Milvus creates multiple collections and inserts data, the search report errors "collectionNotExists, reason: can't find collection". This issue was closed without comments or references, so it might not provide a direct solution to your problem [1].

Regarding your issue, the error messages "collection not available" and "collection not found" suggest that the collection might not be created successfully or is in a dropping state. Here are some steps you can take to troubleshoot and resolve the issue:

  1. Verify Collection Creation: Ensure that the collection is created successfully before attempting to load or query it. Double-check the parameters used in the createCollection function, especially the metric_type, which should be BM25 for full-text search, not COSINE.

  2. Check Schema and Index Parameters: Make sure the schema and index parameters align with the requirements for BM25 metric type collections. The schema should include fields like id, text, and sparse with appropriate data types [2].

  3. Load the Collection: If the collection is created but not loaded, you need to load it into memory before performing operations. This can be done using the loadCollection method.

  4. Review Milvus Logs: Examine the Milvus logs for any additional error messages or warnings that might provide more context on why the collection is not available.

  5. Check Server Status: Ensure that the Milvus server is running without errors and that all necessary services are operational.

By following these steps, you should be able to identify and resolve the issue with creating and loading the BM25 collection in Milvus using the Node.js SDK.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@luisdanielbarros
Copy link
Author

@dosu

  1. I have checked the collection name, attempts to load the collection even, result in the same infinite awaiting time
  2. Nice call, however after updating the schema to
import { DataType, FunctionType } from "@zilliz/milvus2-sdk-node";

// define schema
export const milvus_bot_collectionName = `bot`;
export const getMilvusSchema = (dim: number) => [
  {
    name: `id`,
    data_type: DataType.Int64,
    is_primary_key: true,
  },
  {
    name: `embeddings`,
    data_type: DataType.FloatVector,
    dim: 768,
  },
  {
    name: `sparseEmbeddings`,
    data_type: DataType.SparseFloatVector,
  },
  {
    name: `ownerType`,
    data_type: DataType.VarChar,
    max_length: 128,
  },
  {
    name: `ownerId`,
    data_type: DataType.VarChar,
    max_length: 128,
  },
  {
    name: `documentId`,
    data_type: DataType.VarChar,
    max_length: 1000,
  },
  {
    name: `documentType`,
    data_type: DataType.VarChar,
    max_length: 1000,
  },
  {
    name: `document`,
    data_type: DataType.VarChar,
    max_length: 1000,
    enable_analyzer: true,
    enable_match: true,
  },
  {
    name: `metadata`,
    data_type: DataType.VarChar,
    max_length: 1000,
  },
];

export const milvusBM25Functions = [
  {
    name: "text_bm25_emb",
    description: "bm25 function",
    type: FunctionType.BM25,
    input_field_names: ["document"],
    output_field_names: ["sparseEmbeddings"],
    params: {},
  },
];

export const milvusBM25IndexParams = [
  {
    field_name: "sparseEmbeddings",
    metric_type: "BM25",
    index_type: "AUTOINDEX",
  },
];

the problem remains
3. I tried so before, loading the collection results in the same infinite waiting time
4. Reviewed logs and shared them, but nothing conclusive there, unless a milvus developer can spot something I couldn't, nothing seemed critical to me
5. I see no errors, and whenever I remove the

    index_params: milvusBM25IndexParams,
    functions: milvusBM25Functions,

lines from the creating function, it works all good again

Copy link

dosubot bot commented Mar 11, 2025

To address the issue you're facing with creating a collection using BM25 and AUTOINDEX in Milvus with the Node.js SDK, consider the following:

  1. Metric Type and Index Type: Ensure that the metric_type is set to BM25 and the index_type is SPARSE_INVERTED_INDEX instead of AUTOINDEX for the sparseEmbeddings field. The AUTOINDEX type might not be suitable for BM25, which is specifically designed for sparse vectors and full-text search [1].

  2. Function Configuration: Verify that the function configuration is correctly set up to convert text to sparse vector representations. Ensure that the input_field_names and output_field_names match the schema fields correctly.

  3. Milvus Logs: Although you mentioned reviewing the logs, double-check for any specific errors related to index creation or function execution. These might provide more insights into what might be going wrong.

  4. Node.js SDK Limitations: There might be limitations or issues with the Node.js SDK regarding BM25 or specific index types. Consider testing with a different SDK (e.g., Python) to see if the issue persists, which could help isolate whether it's an SDK-specific problem.

  5. Milvus Version Compatibility: Ensure that the Milvus version you are using supports the configurations you are trying to implement. Sometimes, certain features or configurations might not be fully supported in specific versions.

By adjusting the index type and ensuring the function configuration aligns with the schema, you might resolve the issue. If the problem persists, further investigation into SDK-specific limitations or version compatibility might be necessary.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@luisdanielbarros
Copy link
Author

The SPARSE_INVERTED_INDEX, according to the documentation, is meant to be used for adding indexes to an already existing collection, the documentation showing how-to create collections doesn't mention it, since the dosubot is not producing relevant responses anymore I'll be awaiting for human feedback

@yanliang567
Copy link
Contributor

I think the sdk does not support the function params in create_collection()...
/assign @shanghaikid
please help to take a look
/unassign

@yanliang567 yanliang567 added triage/accepted Indicates an issue or PR is ready to be actively worked on. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 12, 2025
@shanghaikid
Copy link
Contributor

hi, there, you have two vector fields in your schema, you need to have all your vector fields indexed in order to load the collection. @yanliang567 , the error is from Milvus, is it working as expected?

After change your code, it works.

const milvusBM25IndexParams = [
    {
      field_name: 'embeddings',
      metric_type: 'L2',
      index_type: 'AUTOINDEX',
    },
    {
      field_name: 'sparseEmbeddings',
      metric_type: 'BM25',
      index_type: 'AUTOINDEX',
    },
  ];

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
component/rootcoord kind/bug Issues or changes related a bug triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

3 participants