How to sovle warning:The context length of the model is too short to hold the multi-modal embeddings in the worst case #738

vefalun · 2025-02-08T03:50:29Z

When I run vllm based on the code example in the readme file on an 8-card A100, the following warning occurs: (VllmWorkerProcess pid=427033) WARNING 02-08 11:44:42 profiling.py:187] The context length (128000) of the model is too short to hold the multi-modal embeddings in the worst case (131072 tokens in total, out of which {'image': 16384, 'video': 114688} are reserved for multi-modal embeddings). This may cause certain multi-modal inputs to fail during inference, even when the input text is short. To avoid this, you should increase max_model_len, reduce max_num_seqs, and/or reduce mm_counts. However, I couldn't find the configurations for max_model_len, max_num_seqs, and mm_countsin theconfig.json` file. How should I adjust these settings to avoid this warning? Thank you very much!

luosting · 2025-02-08T06:17:28Z

I meet the same problem, this size of tokens generated by new version is larger than the previous one.

wulipc · 2025-02-12T08:32:39Z

Hi, thanks for your interest in the Qwen model! This warning appears during the VLLM profile_run. In the original code, we added +1 to the video's num_frames in the dummy_data step to avoid having an odd number of frames. This resulted in generating more tokens than allowed (the context length ). This issue has been fixed in the latest VLLM code! Check out the details in this PR. This warning won't affect your actual inference, so no worries there. If it bothers you, you can update to the fixed version.

0x9be00ff1 · 2025-02-14T05:50:37Z

+1, same problem

wulipc · 2025-02-20T06:17:24Z

+1, same problem

This issue has been fixed in the latest version of vLLM. You can try updating it.

wulipc · 2025-02-20T06:17:36Z

@vefalun If your issue has been resolved, please close the issue.

spirillen mentioned this issue Feb 8, 2025

mm-11.com mypdns/matrix#81356

Open

This was referenced Feb 12, 2025

[Bugfix] Clean up and fix multi-modal processors vllm-project/vllm#13012

Merged

[Bugfix] Fix num video tokens calculation for Qwen2-VL vllm-project/vllm#13148

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to sovle warning:The context length of the model is too short to hold the multi-modal embeddings in the worst case #738

How to sovle warning:The context length of the model is too short to hold the multi-modal embeddings in the worst case #738

vefalun commented Feb 8, 2025

luosting commented Feb 8, 2025

wulipc commented Feb 12, 2025

0x9be00ff1 commented Feb 14, 2025

wulipc commented Feb 20, 2025

wulipc commented Feb 20, 2025

How to sovle warning:The context length of the model is too short to hold the multi-modal embeddings in the worst case #738

How to sovle warning:The context length of the model is too short to hold the multi-modal embeddings in the worst case #738

Comments

vefalun commented Feb 8, 2025

luosting commented Feb 8, 2025

wulipc commented Feb 12, 2025

0x9be00ff1 commented Feb 14, 2025

wulipc commented Feb 20, 2025

wulipc commented Feb 20, 2025