feature: support bm42 embeddings #338

cdxker · 2024-07-10T18:22:41Z

What does this PR do?

This pr adds support for Qdrant/all_miniLM_L6_v2_with_attentions I followed https://github.com/Anush008/bm42-rs as a reference and added a new pooling method of bm42.

Tested with running with command

HF_TOKEN=hf_********************************** cargo run --no-default-features -F http -F ort -- --model-id Qdrant/all_miniLM_L6_v2_with_attentions --pooling bm42 --port 5888

Fixes #337

Before submitting

Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Issue Support for Qdrant bm42 document encoder #337
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

OlivierDehaene · 2024-07-11T09:26:22Z

BM42 is super controversial.
See https://x.com/Nils_Reimers/status/1809334134088622217
I'm on the side of waiting a bit to see how this evolve.

OlivierDehaene · 2024-07-11T09:28:05Z

To me Qdrant came at as a prety shady actor here so I would take everything they publish with a big grain of salt.

cdxker · 2024-07-11T18:08:32Z

To me Qdrant came at as a prety shady actor here so I would take everything they publish with a big grain of salt.

Fair. We really just want support for it so we could test on our own datasets and verify how good/bad it actually is. I agree a lot of that launch was shady; especially the quickwit bench portion.

Any feedback on the actual implementation? Because the router function async fn embed_sparse expects a dense vector then sparsifies it is making it super expensive to serve a lot of sparse embedding models. In the case of this model it uses a murmur3 hash which has a very large key Space.

Which means that Infer has to create and send a very large dense vector across a channel. https://github.com/devflowinc/text-embeddings-inference/blob/dc62b26f04394c44aa3469c10ba8880d3076e087/backends/ort/src/bm42.rs#L293-L296

https://github.com/devflowinc/text-embeddings-inference/blob/dc62b26f04394c44aa3469c10ba8880d3076e087/core/src/infer.rs#L193-L204

This will impact adding support for other sparse models as well as this model. It would be a lot better if Infer::embed_sparse either:

Doesn't call the embed function and have each Backend have an embed_sparse function themselves.
Have Infer::embed_spasrse sparsify the vector before it gets sent back across the channel

I don't have enough context on what the best approach for this is, but it would open the opportunity for supporting other sparse embedding models that have larger keyspaces.

OlivierDehaene · 2024-07-15T15:46:47Z

Which means that Infer has to create and send a very large dense vector across a channel.

Why is that a problem? Moving a Vec shouldn't be a concern, it's not expensive as far as I know.
But yes, having a dedicated embed_sparse function in the backend trait could be interesting.

cdxker · 2024-07-16T20:35:15Z

Creating a Vector with a max of 9223372036854776000 f32's is pretty expensive. In this case, even if they are all 0 filled

OlivierDehaene · 2024-07-17T10:02:22Z

Creating a Vector with a max of 9223372036854776000 f32's is pretty expensive. In this case, even if they are all 0 filled

Ok so when you say "Which means that Infer has to create and send a very large dense vector across a channel." you are only referring to the creation part? As I said the moving part ("send accross a channel") is not an issue.

feature: support bm42 embeddings

dc62b26

cdxker mentioned this pull request Jul 10, 2024

feature: add bm42 to huggingface/text-embeddings-inference devflowinc/trieve#1840

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: support bm42 embeddings #338

feature: support bm42 embeddings #338

cdxker commented Jul 10, 2024 •

edited

Loading

OlivierDehaene commented Jul 11, 2024

OlivierDehaene commented Jul 11, 2024

cdxker commented Jul 11, 2024

OlivierDehaene commented Jul 15, 2024

cdxker commented Jul 16, 2024

OlivierDehaene commented Jul 17, 2024

feature: support bm42 embeddings #338

Are you sure you want to change the base?

feature: support bm42 embeddings #338

Conversation

cdxker commented Jul 10, 2024 • edited Loading

What does this PR do?

Before submitting

Who can review?

OlivierDehaene commented Jul 11, 2024

OlivierDehaene commented Jul 11, 2024

cdxker commented Jul 11, 2024

OlivierDehaene commented Jul 15, 2024

cdxker commented Jul 16, 2024

OlivierDehaene commented Jul 17, 2024

cdxker commented Jul 10, 2024 •

edited

Loading