-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
How to sovle warning:The context length of the model is too short to hold the multi-modal embeddings in the worst case #738
Comments
I meet the same problem, this size of tokens generated by new version is larger than the previous one. |
Hi, thanks for your interest in the Qwen model! This warning appears during the VLLM profile_run. In the original code, we added +1 to the video's |
+1, same problem |
This issue has been fixed in the latest version of vLLM. You can try updating it. |
@vefalun If your issue has been resolved, please close the issue. |
When I run vllm based on the code example in the readme file on an 8-card A100, the following warning occurs: (VllmWorkerProcess pid=427033) WARNING 02-08 11:44:42 profiling.py:187] The context length (128000) of the model is too short to hold the multi-modal embeddings in the worst case (131072 tokens in total, out of which {'image': 16384, 'video': 114688} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase max_model_len, reduce max_num_seqs, and/or reduce mm_counts. However, I couldn't find the configurations for max_model_len, max_num_seqs, and mm_countsin theconfig.json` file. How should I adjust these settings to avoid this warning? Thank you very much!
The text was updated successfully, but these errors were encountered: