-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Support Cross encoder models #10400
Support Cross encoder models #10400
Conversation
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
👋 Hi! Thank you for contributing to the vLLM project. Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can do one of these:
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an example script to show how to use this API?
QQ: Why do we have to define a separate "cross_encoding"
task for this? I think we can keep using "embedding"
task if we just override the pooler
method instead of defining a new classification_output
method.
Yes, I've added one in the PR description but it's a good idea to add it to the documentation.
I thought about this, and the only reason I kept it that way was that in the serving layer I need to know what task is being done because I need to call the tokenizer with |
I think it may be simpler to make this a separate flag, similar to how we have a flag for multimodal models, rather than creating a new task for it. That way, we won't have to change our internals at all. |
I think having a separate API for this would be cleaner as well - perhaps a Scoring API where we output a single score? |
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
I've removed the "cross_encoding" task and added a Pending TODOs:
|
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
I've added a Pending TODOs:
|
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
32d0610
to
09d4ca6
Compare
# yapf conflicts with isort for this block | ||
# yapf: disable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should re-enable yapf afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise looks good, thanks for your effort and patience!
Wait, we still need to update the OpenAI page under |
The tests that aren't failing do to |
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
@DarkLight1337 I have included in this last commit of mine, some documentation for the score API in the OpenAI Compatible Server page. Would this be the right place to add it? |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
#10581 should fix it, please merge from main again. |
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
I think we can move this up one section so that parameters are shown after all API mentions. |
…ection upMoves score API section upMoves score API section upMoves score API section upMoves score API section upMoves score API section upMoves score API section up Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
This PR adds support for Cross Encoder models reusing most of what was done to support embedding models. It includes:
BAAI/bge-reranker-v2-m3
cross-encoder/ms-marco-MiniLM-L-6-v2
score
API endpointLLM.score()
methodAn example of an API call is:
Response:
Instead of adding a new task type, the models in this PR are loaded with
task == "embedding"
but support the additionalis_cross_encoder
property. This allows us to reuse most of what was done the embedding support. The only changes required to the core layers between theAPI
andLLM
is the propagation thetoken_type_ids
list. This is an additional tokenizer output that is generated for the BERT models and has to be passed to the model's embedding layer.cc: @DarkLight1337 @flaviabeo
FIX #8022