Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Bug]: Index Build Progress is always Finished #40580

Open
1 task done
lowener opened this issue Mar 11, 2025 · 5 comments
Open
1 task done

[Bug]: Index Build Progress is always Finished #40580

lowener opened this issue Mar 11, 2025 · 5 comments
Assignees
Labels
help wanted Extra attention is needed

Comments

@lowener
Copy link
Contributor

lowener commented Mar 11, 2025

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.5.3
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2): 2.5.3
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

Calls to pymilvus utility.index_building_progress returns the following:

{'total_rows': 1000000, 'indexed_rows': 0, 'pending_index_rows': 1000000, 'state': 'Finished'}
{'total_rows': 1000000, 'indexed_rows': 0, 'pending_index_rows': 1000000, 'state': 'Finished'}
{'total_rows': 1000000, 'indexed_rows': 125000, 'pending_index_rows': 875000, 'state': 'Finished'}
...
{'total_rows': 1000000, 'indexed_rows': 875000, 'pending_index_rows': 125000, 'state': 'Finished'}
{'total_rows': 1000000, 'indexed_rows': 1000000, 'pending_index_rows': 0, 'state': 'Finished'}

The state is always Finished even though it is still building. This create another issue with utility.wait_for_index_building_complete that doesn't wait for build to be done, since it is probably just checking that state==Finished.

Expected Behavior

Have state==In Progress if build is not done. And utility.wait_for_index_building_complete to actually wait for build.

Steps To Reproduce

Milvus Log

No response

Anything else?

No response

@lowener lowener added kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 11, 2025
Copy link

dosubot bot commented Mar 11, 2025

Hey @lowener! 👋 I'm here to help you with any bugs, questions, or contributions you have for Milvus. Let's tackle this issue together!

The issue you're experiencing seems to be related to the utility.index_building_progress function incorrectly reporting the state as Finished even when the index is still being built. According to the expected behavior, the state should reflect In Progress while the index is being built, and only switch to Finished once the process is complete [1].

This incorrect state reporting affects the utility.wait_for_index_building_complete function, which relies on the state to determine if the index building process is complete. The function should ideally check if pending_index_rows is zero to confirm completion, rather than relying solely on the state being Finished [2].

To address this, you might need to verify the logic in your implementation or check for any updates or patches that might have been released to fix this issue in the Milvus version you are using.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

@xiaofan-luan
Copy link
Collaborator

Finished means the initial index building is done.

if you keep ingesting data into milvus , then you should see more pending rows, but that doesn't mean you can't search.

If you wait for a while and compaction is not happened for a while, you should finally get indexed_rows = totalrows.

@yanliang567
Copy link
Contributor

/assign @lowener
/unassign

@sre-ci-robot sre-ci-robot assigned lowener and unassigned yanliang567 Mar 12, 2025
@yanliang567 yanliang567 added help wanted Extra attention is needed and removed kind/bug Issues or changes related a bug needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 12, 2025
@lowener
Copy link
Contributor Author

lowener commented Mar 12, 2025

What do you mean by "the initial index building is done"? If 0 rows are indexed it is still considered as "Finished"?

I am using it in the following workflow

# Vectors have already been ingested beforehand
collection.create_index(field_name=EMBEDDING_FIELD, index_params={
    "index_type": index_type,
    "metric_type": "L2",
    "params": index_params
})

utility.wait_for_index_building_complete(collection_name=collection_name, index_name="embedding",
                                         using=get_milvus_client()._using)
# Step 2: Load the index created to make it searchable.
collection.load()

# Wait until the load process completes
utility.wait_for_loading_complete(
    collection_name=collection_name,
    using=get_milvus_client()._using,
)

In the example I just pasted here the wait_for_index_building_complete is useless since it doesn't wait anything. Or if it not the way that this function was intended to be used is there another function that wait until indexed_rows = totalrows

@xiaofan-luan
Copy link
Collaborator

What do you mean by "the initial index building is done"? If 0 rows are indexed it is still considered as "Finished"?

I am using it in the following workflow

Vectors have already been ingested beforehand

collection.create_index(field_name=EMBEDDING_FIELD, index_params={
"index_type": index_type,
"metric_type": "L2",
"params": index_params
})

utility.wait_for_index_building_complete(collection_name=collection_name, index_name="embedding",
using=get_milvus_client()._using)

Step 2: Load the index created to make it searchable.

collection.load()

Wait until the load process completes

utility.wait_for_loading_complete(
collection_name=collection_name,
using=get_milvus_client()._using,
)
In the example I just pasted here the wait_for_index_building_complete is useless since it doesn't wait anything. Or if it not the way that this function was intended to be used is there another function that wait until indexed_rows = totalrows

There are two way to use milvus.
one is create collection -> ingest/bulkinsert -> index build -> load -> search
the other one is create collection -> index build -> load -> ingest -> search.

if you want to ensure your current collection is fully indexed, the easiest way is to resubmit a index request
Each time you call index build manually, you actually change the states of the index and wait_for_index_building_complete will wait until the index state change back to finished.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants