Skip to content

feature: support bm42 embeddings #338

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cdxker
Copy link

@cdxker cdxker commented Jul 10, 2024

What does this PR do?

This pr adds support for Qdrant/all_miniLM_L6_v2_with_attentions I followed https://github.com/Anush008/bm42-rs as a reference and added a new pooling method of bm42.

Tested with running with command

HF_TOKEN=hf_********************************** cargo run --no-default-features -F http -F ort -- --model-id Qdrant/all_miniLM_L6_v2_with_attentions --pooling bm42 --port 5888

Fixes #337

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@OlivierDehaene
Copy link
Contributor

BM42 is super controversial.
See https://x.com/Nils_Reimers/status/1809334134088622217
I'm on the side of waiting a bit to see how this evolve.

@OlivierDehaene
Copy link
Contributor

To me Qdrant came at as a prety shady actor here so I would take everything they publish with a big grain of salt.

@cdxker
Copy link
Author

cdxker commented Jul 11, 2024

To me Qdrant came at as a prety shady actor here so I would take everything they publish with a big grain of salt.

Fair. We really just want support for it so we could test on our own datasets and verify how good/bad it actually is. I agree a lot of that launch was shady; especially the quickwit bench portion.

Any feedback on the actual implementation? Because the router function async fn embed_sparse expects a dense vector then sparsifies it is making it super expensive to serve a lot of sparse embedding models. In the case of this model it uses a murmur3 hash which has a very large key Space.

Which means that Infer has to create and send a very large dense vector across a channel. https://github.com/devflowinc/text-embeddings-inference/blob/dc62b26f04394c44aa3469c10ba8880d3076e087/backends/ort/src/bm42.rs#L293-L296

https://github.com/devflowinc/text-embeddings-inference/blob/dc62b26f04394c44aa3469c10ba8880d3076e087/core/src/infer.rs#L193-L204

This will impact adding support for other sparse models as well as this model. It would be a lot better if Infer::embed_sparse either:

  1. Doesn't call the embed function and have each Backend have an embed_sparse function themselves.
  2. Have Infer::embed_spasrse sparsify the vector before it gets sent back across the channel

I don't have enough context on what the best approach for this is, but it would open the opportunity for supporting other sparse embedding models that have larger keyspaces.

@OlivierDehaene
Copy link
Contributor

Which means that Infer has to create and send a very large dense vector across a channel.

Why is that a problem? Moving a Vec shouldn't be a concern, it's not expensive as far as I know.
But yes, having a dedicated embed_sparse function in the backend trait could be interesting.

@cdxker
Copy link
Author

cdxker commented Jul 16, 2024

Creating a Vector with a max of 9223372036854776000 f32's is pretty expensive. In this case, even if they are all 0 filled

@OlivierDehaene
Copy link
Contributor

Creating a Vector with a max of 9223372036854776000 f32's is pretty expensive. In this case, even if they are all 0 filled

Ok so when you say "Which means that Infer has to create and send a very large dense vector across a channel." you are only referring to the creation part? As I said the moving part ("send accross a channel") is not an issue.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Qdrant bm42 document encoder
2 participants