Support Cross encoder models #10400

maxdebayser · 2024-11-17T03:24:21Z

This PR adds support for Cross Encoder models reusing most of what was done to support embedding models. It includes:

Support for RoBERTA SequenceClassification models, tested with BAAI/bge-reranker-v2-m3
Support for BERT SequenceClassification models, tested with cross-encoder/ms-marco-MiniLM-L-6-v2
A score API endpoint
A LLM.score() method
Support for GPU and CPU

An example of an API call is:

$ curl -X 'POST'   'http://localhost:8000/v1/score'   -H 'accept: application/json'   -H 'Content-Type: application/json'   -d '{
  "model": "BAAI/bge-reranker-v2-m3",
  "text_1": "What is the capital of France?",
  "text_2" : ["The capital of France is Paris."]
}

Response:

{
  "id": "score-64dcf95966094892b0f0ccc63637c6d2",
  "object": "list",
  "created": 239128,
  "model": "BAAI/bge-reranker-v2-m3",
  "data": [
    {
      "index": 0,
      "object": "score",
      "score": [
        1.0
      ]
    }
  ],
  "usage": {
    "prompt_tokens": 0,
    "total_tokens": 0,
    "completion_tokens": 0,
    "prompt_tokens_details": null
  }
}

Instead of adding a new task type, the models in this PR are loaded with task == "embedding" but support the additional is_cross_encoder property. This allows us to reuse most of what was done the embedding support. The only changes required to the core layers between the API and LLM is the propagation the token_type_ids list. This is an additional tokenizer output that is generated for the BERT models and has to be passed to the model's embedding layer.

cc: @DarkLight1337 @flaviabeo

FIX #8022

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

github-actions · 2024-11-17T03:24:32Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337

Can you add an example script to show how to use this API?

QQ: Why do we have to define a separate "cross_encoding" task for this? I think we can keep using "embedding" task if we just override the pooler method instead of defining a new classification_output method.

vllm/model_executor/models/bert.py

maxdebayser · 2024-11-17T10:54:59Z

Can you add an example script to show how to use this API?

Yes, I've added one in the PR description but it's a good idea to add it to the documentation.

QQ: Why do we have to define a separate "cross_encoding" task for this? I think we can keep using "embedding" task if we just override the pooler method instead of defining a new classification_output method.

I thought about this, and the only reason I kept it that way was that in the serving layer I need to know what task is being done because I need to call the tokenizer with tokenizer(text=text1, text_pair=text2). If instead of reusing the chat embeddings API we had a new endpoint just for cross encoding, this wouldn't be necessary. Or perhaps we can add an attribute to some of the config classes to tell the although the task is "embedding", the model is actually a "BertModelForSequenceClassification"

DarkLight1337 · 2024-11-17T11:13:27Z

I think it may be simpler to make this a separate flag, similar to how we have a flag for multimodal models, rather than creating a new task for it. That way, we won't have to change our internals at all.

DarkLight1337 · 2024-11-17T11:24:16Z

I think having a separate API for this would be cleaner as well - perhaps a Scoring API where we output a single score?

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

maxdebayser · 2024-11-17T23:26:08Z

I've removed the "cross_encoding" task and added a is_cross_encoder property to the ModelConfig class. I've also added support for Roberta models.

Pending TODOs:

Add Scoring API
Support for cross encoding in the LLM class
Tests comparing with sentence-transformers
CPU support

vllm/model_executor/models/interfaces.py

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

maxdebayser · 2024-11-19T02:43:01Z

I've added a score() method to the LLM class and added tests for it. I've also fixed the CPU support.

Pending TODOs:

Add Scoring API
Test Scoring API comparing with sentence-transformers

vllm/entrypoints/llm.py

vllm/model_executor/models/bert.py

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Signed-off-by: Flavia Beo <flavia.beo@ibm.com>

DarkLight1337 · 2024-11-22T13:33:01Z

tests/models/test_registry.py

+# yapf conflicts with isort for this block
+# yapf: disable


You should re-enable yapf afterwards.

DarkLight1337

Otherwise looks good, thanks for your effort and patience!

DarkLight1337 · 2024-11-22T15:52:40Z

Wait, we still need to update the OpenAI page under /docs

maxdebayser · 2024-11-22T17:49:42Z

The tests that aren't failing do to hugginface.co timeout are doing so because of a JSON serialization problem. I've opened a tentative fix here: #10580

Signed-off-by: Flavia Beo <flavia.beo@ibm.com>

flaviabeo · 2024-11-22T18:29:01Z

@DarkLight1337 I have included in this last commit of mine, some documentation for the score API in the OpenAI Compatible Server page. Would this be the right place to add it?

mergify · 2024-11-22T22:06:01Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maxdebayser.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

DarkLight1337 · 2024-11-23T02:22:17Z

#10581 should fix it, please merge from main again.

mergify · 2024-11-23T05:26:26Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @maxdebayser.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

DarkLight1337 · 2024-11-24T03:01:30Z

@DarkLight1337 I have included in this last commit of mine, some documentation for the score API in the OpenAI Compatible Server page. Would this be the right place to add it?

I think we can move this up one section so that parameters are shown after all API mentions.

…ection upMoves score API section upMoves score API section upMoves score API section upMoves score API section upMoves score API section upMoves score API section up Signed-off-by: Flavia Beo <flavia.beo@ibm.com>

Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com> Signed-off-by: Maxime Fournioux <55544262+mfournioux@users.noreply.github.com>

Signed-off-by: Max de Bayser <maxdebayser@gmail.com> Signed-off-by: Max de Bayser <mbayser@br.ibm.com> Signed-off-by: Flavia Beo <flavia.beo@ibm.com> Co-authored-by: Flavia Beo <flavia.beo@ibm.com>

maxdebayser added 2 commits November 16, 2024 23:58

Add support for cross encoders

3091c09

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Merge branch 'main' into cross_encoder

4f4d4be

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

maxdebayser requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac, alexm-redhat, zhuohan123 and youkaichao as code owners November 17, 2024 03:24

mergify bot added the frontend label Nov 17, 2024

DarkLight1337 reviewed Nov 17, 2024

View reviewed changes

vllm/model_executor/models/bert.py Outdated Show resolved Hide resolved

maxdebayser added 2 commits November 17, 2024 13:43

Add support for roberta models, including BAAI/bge-reranker-v2-m3

eadc8ed

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

remove task cross_encoding

b6a0092

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

maxdebayser mentioned this pull request Nov 18, 2024

Support Roberta embedding models #9387

Merged

DarkLight1337 reviewed Nov 18, 2024

View reviewed changes

vllm/model_executor/models/interfaces.py Show resolved Hide resolved

maxdebayser added 5 commits November 18, 2024 11:38

address review comments, fix bug, clean up diff

5b17c70

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

add cpu support

1f02dfa

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Add a score() method top the LLM entrypoint

7d63ed1

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

raise exception in case of MistralTokenizer

61e72c8

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

Add tests for the LLM.score() method

ecc3d10

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

DarkLight1337 reviewed Nov 19, 2024

View reviewed changes

vllm/entrypoints/llm.py Outdated Show resolved Hide resolved

DarkLight1337 reviewed Nov 19, 2024

View reviewed changes

vllm/model_executor/models/bert.py Outdated Show resolved Hide resolved

refactor common code

6e3e654

Signed-off-by: Max de Bayser <maxdebayser@gmail.com>

flaviabeo added 2 commits November 22, 2024 09:57

Merge test registry + fix example lint

a374f79

Signed-off-by: Flavia Beo <flavia.beo@ibm.com>

yapf disble - conflicts with isort

09d4ca6

Signed-off-by: Flavia Beo <flavia.beo@ibm.com>

flaviabeo force-pushed the cross_encoder branch from 32d0610 to 09d4ca6 Compare November 22, 2024 13:12

DarkLight1337 reviewed Nov 22, 2024

View reviewed changes

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 22, 2024

DarkLight1337 approved these changes Nov 22, 2024

View reviewed changes

Add Cross Encoders score API docs for OpenAI compatible page

024837b

Signed-off-by: Flavia Beo <flavia.beo@ibm.com>

mergify bot added the documentation Improvements or additions to documentation label Nov 22, 2024

mergify bot added the needs-rebase label Nov 22, 2024

Merge branch 'upstream_main' into cross_encoder

cbc7364

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

mergify bot removed the needs-rebase label Nov 22, 2024

mergify bot added the needs-rebase label Nov 23, 2024

Merge branch 'upstream_main' into cross_encoder

bc9ddc1

Signed-off-by: Max de Bayser <mbayser@br.ibm.com>

mergify bot removed the needs-rebase label Nov 24, 2024

youkaichao merged commit 214efc2 into vllm-project:main Nov 25, 2024
55 of 57 checks passed

DarkLight1337 mentioned this pull request Nov 25, 2024

[Core][Model] Add simple_model_runner and a new model XLMRobertaForSequenceClassification through multimodal interface #6260

Closed

DarkLight1337 mentioned this pull request Nov 29, 2024

[Roadmap] vLLM Roadmap Q4 2024 #9006

Open

40 tasks

DarkLight1337 mentioned this pull request Dec 6, 2024

[Feature]: Add classification Task with AutoModelForSequenceClassification and BertForSequenceClassification #10939

Closed

1 task

noooop mentioned this pull request Dec 30, 2024

[Performance]: The performance of bge-rerank model on vllm and huggingface is inconsistent #11568

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Cross encoder models #10400

Support Cross encoder models #10400

maxdebayser commented Nov 17, 2024 •

edited by github-actions bot

Loading

github-actions bot commented Nov 17, 2024

DarkLight1337 left a comment •

edited

Loading

maxdebayser commented Nov 17, 2024

DarkLight1337 commented Nov 17, 2024 •

edited

Loading

DarkLight1337 commented Nov 17, 2024 •

edited

Loading

maxdebayser commented Nov 17, 2024

maxdebayser commented Nov 19, 2024

DarkLight1337 Nov 22, 2024

DarkLight1337 left a comment

DarkLight1337 commented Nov 22, 2024

maxdebayser commented Nov 22, 2024

flaviabeo commented Nov 22, 2024

mergify bot commented Nov 22, 2024

DarkLight1337 commented Nov 23, 2024

mergify bot commented Nov 23, 2024

DarkLight1337 commented Nov 24, 2024

Support Cross encoder models #10400

Support Cross encoder models #10400

Conversation

maxdebayser commented Nov 17, 2024 • edited by github-actions bot Loading

github-actions bot commented Nov 17, 2024

DarkLight1337 left a comment • edited Loading

Choose a reason for hiding this comment

maxdebayser commented Nov 17, 2024

DarkLight1337 commented Nov 17, 2024 • edited Loading

DarkLight1337 commented Nov 17, 2024 • edited Loading

maxdebayser commented Nov 17, 2024

maxdebayser commented Nov 19, 2024

DarkLight1337 Nov 22, 2024

Choose a reason for hiding this comment

DarkLight1337 left a comment

Choose a reason for hiding this comment

DarkLight1337 commented Nov 22, 2024

maxdebayser commented Nov 22, 2024

flaviabeo commented Nov 22, 2024

mergify bot commented Nov 22, 2024

DarkLight1337 commented Nov 23, 2024

mergify bot commented Nov 23, 2024

DarkLight1337 commented Nov 24, 2024

maxdebayser commented Nov 17, 2024 •

edited by github-actions bot

Loading

DarkLight1337 left a comment •

edited

Loading

DarkLight1337 commented Nov 17, 2024 •

edited

Loading

DarkLight1337 commented Nov 17, 2024 •

edited

Loading