Skip to content

Added changes to ensure mxint8 compilations of VLMs work. #336

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

quic-dhirajku
Copy link
Contributor

Modified modelling files of InternVL and Llava to have 'vision_embeds' as the name of the image_embeddings.
Modified modeling_auto file to incorporate mxint8 modifications for VLMs.
LIMITATIONS: It is expected that the Processor of a model always gives vision components in 'float16'.

@@ -1385,7 +1342,7 @@ def from_pretrained(
model, kv_offload=kv_offload
)

return cls(model, is_tlm=is_tlm, continuous_batching=continuous_batching)
return cls(model, is_tlm=is_tlm, continuous_batching=continuous_batching, enable_qnn=enable_qnn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please inform QNN team about this change before merging, so that nothing breaks at their end due to this change.

ctx_len: int = 150,
full_batch_size: Optional[int] = None,
kv_cache_batch_size: Optional[int] = None,
encoder_ctx_len: int = 1500,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

always use meaningful constants rather than magic numbers in the code

@quic-amitraj quic-amitraj marked this pull request as draft April 17, 2025 12:16
Modified modelling files of InternVL and Llava to have 'vision_embeds' as the name of the image_embeddings.
Modified modeling_auto file to incorporate mxint8 modifications for VLMs.
LIMITATIONS: It is expected that the Processor of a model always gives vision components in 'float16'.

Signed-off-by: quic-dhirajku <quic_dhirajku@quicinc.com>
Signed-off-by: Dhiraj Kumar Sah <quic_dhirajku@quicinc.com>
… out older qnn based changes.

Signed-off-by: quic-dhirajku <quic_dhirajku@quicinc.com>
Signed-off-by: Dhiraj Kumar Sah <quic_dhirajku@quicinc.com>
quic-dhirajku and others added 2 commits April 22, 2025 08:57
Signed-off-by: Dhiraj Kumar Sah <quic_dhirajku@quicinc.com>
@quic-rishinr quic-rishinr marked this pull request as ready for review April 22, 2025 15:54
@quic-xiyushi
Copy link

Looks good to me. Verified in vLLM.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants