-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[New Model]: Add Cohere2 Model #11357
Comments
This has already been added in #11203, and is in the latest released version of vLLM. Thanks for the offer though! |
@DarkLight1337 i think there is some issues for the impl:
So i think maybe gemma2 has the same impl issue |
Thanks for the report! I'll ask @simon-mo to take a look since he added this. |
@DarkLight1337 A followup thought, for long context impl correctness, we could add a needle test to gate the corretness. I would be able to help also. |
actually i think @youkaichao added this |
🚀 The feature, motivation and pitch
Recently cohere released a CommandR7B model in huggingface and I would like to contribute the vllm implementation version of it. @simon-mo
PR: #11358
The model also uses the interleave attention like gemma2 and mistral, so kv cache optimization is needed. I saw it is also on the roadmap. #9464
Alternatives
No response
Additional context
I have integrated and tested it work with all the benchmark scripts and would like to add a feature branch for review.
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: