[Enhancement]: get doc ids by batch #40607

SpadeA-Tang · 2025-03-12T08:44:59Z

Is there an existing issue for this?

I have searched the existing issues

What would you like to be added?

Milvus row id does not match with tantivy doc id, so we first get the tantivy doc id produced by query,
then we search the related doc id, which stores in the "doc_id" field with type fast_field.

This one by one process is not efficient: 1. dynamic dispatch is involed when getting doc id in "doc_id" field, 2. we cannot benefit from SIMD optimization. 3. memory locality and so on.

Why is this needed?

No response

Anything else?

No response

xiaofan-luan · 2025-03-12T21:07:49Z

Is there an existing issue for this?

I have searched the existing issues

What would you like to be added?

Milvus row id does not match with tantivy doc id, so we first get the tantivy doc id produced by query, then we search the related doc id, which stores in the "doc_id" field with type fast_field.

This one by one process is not efficient: 1. dynamic dispatch is involed when getting doc id in "doc_id" field, 2. we cannot benefit from SIMD optimization. 3. memory locality and so on.

Why is this needed?

No response

Anything else?

No response

is it a good idea to have a specialized map implementation for doc id to accelerate

xiaofan-luan · 2025-03-12T21:14:30Z

instead of optimizing filter, we need a data structure really good at retrieve, like a hash map

SpadeA-Tang · 2025-03-13T02:02:57Z

#40608 optimizes it by utilizing tantivy's internal batch API and can achieve better performance. @xiaofan-luan

SpadeA-Tang added the kind/enhancement Issues or changes related to enhancement label Mar 12, 2025

SpadeA-Tang mentioned this issue Mar 12, 2025

enhance: get doc ids by batch #40608

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enhancement]: get doc ids by batch #40607

[Enhancement]: get doc ids by batch #40607

SpadeA-Tang commented Mar 12, 2025

xiaofan-luan commented Mar 12, 2025

Is there an existing issue for this?

What would you like to be added?

Why is this needed?

Anything else?

xiaofan-luan commented Mar 12, 2025

SpadeA-Tang commented Mar 13, 2025

[Enhancement]: get doc ids by batch #40607

[Enhancement]: get doc ids by batch #40607

Comments

SpadeA-Tang commented Mar 12, 2025

Is there an existing issue for this?

What would you like to be added?

Why is this needed?

Anything else?

xiaofan-luan commented Mar 12, 2025

Is there an existing issue for this?

What would you like to be added?

Why is this needed?

Anything else?

xiaofan-luan commented Mar 12, 2025

SpadeA-Tang commented Mar 13, 2025