Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Support Cross encoder models #10400

Merged
merged 30 commits into from
Nov 25, 2024
Merged

Conversation

maxdebayser
Copy link
Contributor

@maxdebayser maxdebayser commented Nov 17, 2024

This PR adds support for Cross Encoder models reusing most of what was done to support embedding models. It includes:

  • Support for RoBERTA SequenceClassification models, tested with BAAI/bge-reranker-v2-m3
  • Support for BERT SequenceClassification models, tested with cross-encoder/ms-marco-MiniLM-L-6-v2
  • A score API endpoint
  • A LLM.score() method
  • Support for GPU and CPU

An example of an API call is:

$ curl -X 'POST'   'http://localhost:8000/v1/score'   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "model": "BAAI/bge-reranker-v2-m3",
  "text_1": "What is the capital of France?",
  "text_2" : ["The capital of France is Paris."]
}

Response:

{
  "id": "score-64dcf95966094892b0f0ccc63637c6d2",
  "object": "list",
  "created": 239128,
  "model": "BAAI/bge-reranker-v2-m3",
  "data": [
    {
      "index": 0,
      "object": "score",
      "score": [
        1.0
      ]
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}

Instead of adding a new task type, the models in this PR are loaded with task == "embedding" but support the additional is_cross_encoder property. This allows us to reuse most of what was done the embedding support. The only changes required to the core layers between the API and LLM is the propagation the token_type_ids list. This is an additional tokenizer output that is generated for the BERT models and has to be passed to the model's embedding layer.

cc: @DarkLight1337 @flaviabeo

FIX #8022

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Copy link

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

  • Add ready label to the PR
  • Enable auto-merge.

🚀

@mergify mergify bot added the frontend label Nov 17, 2024
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an example script to show how to use this API?

QQ: Why do we have to define a separate "cross_encoding" task for this? I think we can keep using "embedding" task if we just override the pooler method instead of defining a new classification_output method.

@maxdebayser
Copy link
Contributor Author

Can you add an example script to show how to use this API?

Yes, I've added one in the PR description but it's a good idea to add it to the documentation.

QQ: Why do we have to define a separate "cross_encoding" task for this? I think we can keep using "embedding" task if we just override the pooler method instead of defining a new classification_output method.

I thought about this, and the only reason I kept it that way was that in the serving layer I need to know what task is being done because I need to call the tokenizer with tokenizer(text=text1, text_pair=text2). If instead of reusing the chat embeddings API we had a new endpoint just for cross encoding, this wouldn't be necessary. Or perhaps we can add an attribute to some of the config classes to tell the although the task is "embedding", the model is actually a "BertModelForSequenceClassification"

@DarkLight1337
Copy link
Member

DarkLight1337 commented Nov 17, 2024

I think it may be simpler to make this a separate flag, similar to how we have a flag for multimodal models, rather than creating a new task for it. That way, we won't have to change our internals at all.

@DarkLight1337
Copy link
Member

DarkLight1337 commented Nov 17, 2024

I think having a separate API for this would be cleaner as well - perhaps a Scoring API where we output a single score?

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
@maxdebayser
Copy link
Contributor Author

I've removed the "cross_encoding" task and added a is_cross_encoder property to the ModelConfig class. I've also added support for Roberta models.

Pending TODOs:

  • Add Scoring API
  • Support for cross encoding in the LLM class
  • Tests comparing with sentence-transformers
  • CPU support

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
@maxdebayser
Copy link
Contributor Author

I've added a score() method to the LLM class and added tests for it. I've also fixed the CPU support.

Pending TODOs:

  • Add Scoring API
  • Test Scoring API comparing with sentence-transformers

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Comment on lines +9 to +10
# yapf conflicts with isort for this block
# yapf: disable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should re-enable yapf afterwards.

@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 22, 2024
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise looks good, thanks for your effort and patience!

@DarkLight1337
Copy link
Member

Wait, we still need to update the OpenAI page under /docs

@maxdebayser
Copy link
Contributor Author

The tests that aren't failing do to hugginface.co timeout are doing so because of a JSON serialization problem. I've opened a tentative fix here: #10580

Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
@mergify mergify bot added the documentation Improvements or additions to documentation label Nov 22, 2024
@flaviabeo
Copy link
Contributor

@DarkLight1337 I have included in this last commit of mine, some documentation for the score API in the OpenAI Compatible Server page. Would this be the right place to add it?

Copy link

mergify bot commented Nov 22, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maxdebayser.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 22, 2024
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@mergify mergify bot removed the needs-rebase label Nov 22, 2024
@DarkLight1337
Copy link
Member

#10581 should fix it, please merge from main again.

Copy link

mergify bot commented Nov 23, 2024

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maxdebayser.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Nov 23, 2024
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
@mergify mergify bot removed the needs-rebase label Nov 24, 2024
@DarkLight1337
Copy link
Member

@DarkLight1337 I have included in this last commit of mine, some documentation for the score API in the OpenAI Compatible Server page. Would this be the right place to add it?

I think we can move this up one section so that parameters are shown after all API mentions.

…ection upMoves score API section upMoves score API section upMoves score API section upMoves score API section upMoves score API section upMoves score API section up

Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
@youkaichao youkaichao merged commit 214efc2 into vllm-project:main Nov 25, 2024
55 of 57 checks passed
mfournioux pushed a commit to mfournioux/vllm that referenced this pull request Nov 28, 2024
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>
sleepwalker2017 pushed a commit to sleepwalker2017/vllm that referenced this pull request Dec 13, 2024
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
anko-intel pushed a commit to HabanaAI/vllm-fork that referenced this pull request Feb 12, 2025
Signed-off-by: Max de Bayser <maxdebayser@gmail.com>
Signed-off-by: Max de Bayser <mbayser@br.ibm.com>
Signed-off-by: Flavia Beo <flavia.beo@ibm.com>
Co-authored-by: Flavia Beo <flavia.beo@ibm.com>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]: Want to support BAAI/bge-reranker-v2-m3 model
4 participants