[New Model]: Add Cohere2 Model #11357

CXIAAAAA · 2024-12-20T04:09:07Z

🚀 The feature, motivation and pitch

Recently cohere released a CommandR7B model in huggingface and I would like to contribute the vllm implementation version of it. @simon-mo

PR: #11358

The model also uses the interleave attention like gemma2 and mistral, so kv cache optimization is needed. I saw it is also on the roadmap. #9464

Alternatives

No response

Additional context

I have integrated and tested it work with all the benchmark scripts and would like to add a feature branch for review.

Before submitting a new issue...

Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

DarkLight1337 · 2024-12-20T04:51:30Z

This has already been added in #11203, and is in the latest released version of vLLM. Thanks for the offer though!

CXIAAAAA · 2024-12-20T05:11:33Z

@DarkLight1337 i think there is some issues for the impl:

the attention is sssf, so when passing the sliding_window as None(global attention layer) and at the same time pass in the cache_config, it will go to this branch

vllm/vllm/attention/layer.py

Line 51 in b880ffb

sliding_window = cache_config.sliding_window

, which makes the sliding window back to 4096. So instead of [4096, 4096, 4096, None]. You current implementation will become [4096, 4096, 4096, 4096]. You could print here to double check:

vllm/vllm/attention/backends/flash_attn.py

Line 616 in b880ffb

self.kv_cache_dtype = kv_cache_dtype
i did this to fix: [Bugfix] Fix sliding window in cohere2 model #11358

So i think maybe gemma2 has the same impl issue

DarkLight1337 · 2024-12-20T05:17:42Z

Thanks for the report! I'll ask @simon-mo to take a look since he added this.

CXIAAAAA · 2024-12-20T05:23:23Z

@DarkLight1337 A followup thought, for long context impl correctness, we could add a needle test to gate the corretness. I would be able to help also.

simon-mo · 2024-12-20T19:44:13Z

actually i think @youkaichao added this

youkaichao · 2024-12-28T13:34:27Z

@CXIAAAAA thanks for the report! The sliding window support for cohere2 is indeed broken, and I added a pr #11583 to fix it.

gemma2 works fine, because cache_config.sliding_window is None for models with interleaved sliding window.

CXIAAAAA added the feature request New feature or request label Dec 20, 2024

DarkLight1337 changed the title ~~[Feature]: Add Cohere2 Model~~ [New Model]: Add Cohere2 Model Dec 20, 2024

DarkLight1337 added new model Requests to new models and removed feature request New feature or request labels Dec 20, 2024

DarkLight1337 closed this as completed Dec 20, 2024

CXIAAAAA mentioned this issue Dec 20, 2024

[Bugfix] Fix sliding window in cohere2 model #11358

Closed

youkaichao mentioned this issue Dec 28, 2024

[bugfix] interleaving sliding window for cohere2 model #11583

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[New Model]: Add Cohere2 Model #11357

[New Model]: Add Cohere2 Model #11357

CXIAAAAA commented Dec 20, 2024 •

edited

Loading

DarkLight1337 commented Dec 20, 2024 •

edited

Loading

CXIAAAAA commented Dec 20, 2024 •

edited

Loading

DarkLight1337 commented Dec 20, 2024 •

edited

Loading

CXIAAAAA commented Dec 20, 2024

simon-mo commented Dec 20, 2024

youkaichao commented Dec 28, 2024

[New Model]: Add Cohere2 Model #11357

[New Model]: Add Cohere2 Model #11357

Comments

CXIAAAAA commented Dec 20, 2024 • edited Loading

🚀 The feature, motivation and pitch

Alternatives

Additional context

Before submitting a new issue...

DarkLight1337 commented Dec 20, 2024 • edited Loading

CXIAAAAA commented Dec 20, 2024 • edited Loading

DarkLight1337 commented Dec 20, 2024 • edited Loading

CXIAAAAA commented Dec 20, 2024

simon-mo commented Dec 20, 2024

youkaichao commented Dec 28, 2024

CXIAAAAA commented Dec 20, 2024 •

edited

Loading

DarkLight1337 commented Dec 20, 2024 •

edited

Loading

CXIAAAAA commented Dec 20, 2024 •

edited

Loading

DarkLight1337 commented Dec 20, 2024 •

edited

Loading