Added changes to ensure mxint8 compilations of VLMs work. #336

quic-dhirajku · 2025-04-02T07:44:14Z

Modified modelling files of InternVL and Llava to have 'vision_embeds' as the name of the image_embeddings.
Modified modeling_auto file to incorporate mxint8 modifications for VLMs.
LIMITATIONS: It is expected that the Processor of a model always gives vision components in 'float16'.

quic-hemagnih · 2025-04-15T23:56:46Z

QEfficient/transformers/models/modeling_auto.py

@@ -1385,7 +1342,7 @@ def from_pretrained(
                model, kv_offload=kv_offload
            )

-        return cls(model, is_tlm=is_tlm, continuous_batching=continuous_batching)
+        return cls(model, is_tlm=is_tlm, continuous_batching=continuous_batching, enable_qnn=enable_qnn)


please inform QNN team about this change before merging, so that nothing breaks at their end due to this change.

quic-hemagnih · 2025-04-15T23:57:32Z

QEfficient/transformers/models/modeling_auto.py

-        ctx_len: int = 150,
-        full_batch_size: Optional[int] = None,
-        kv_cache_batch_size: Optional[int] = None,
+        encoder_ctx_len: int = 1500,


always use meaningful constants rather than magic numbers in the code

Modified modelling files of InternVL and Llava to have 'vision_embeds' as the name of the image_embeddings. Modified modeling_auto file to incorporate mxint8 modifications for VLMs. LIMITATIONS: It is expected that the Processor of a model always gives vision components in 'float16'. Signed-off-by: quic-dhirajku <quic_dhirajku@quicinc.com> Signed-off-by: Dhiraj Kumar Sah <quic_dhirajku@quicinc.com>

… out older qnn based changes. Signed-off-by: quic-dhirajku <quic_dhirajku@quicinc.com> Signed-off-by: Dhiraj Kumar Sah <quic_dhirajku@quicinc.com>

Signed-off-by: Dhiraj Kumar Sah <quic_dhirajku@quicinc.com>

quic-xiyushi · 2025-04-23T00:02:33Z

Looks good to me. Verified in vLLM.

quic-dhirajku requested review from quic-rishinr and ochougul as code owners April 2, 2025 07:44

quic-rishinr added the 1.20.0 label Apr 8, 2025

quic-hemagnih reviewed Apr 15, 2025

View reviewed changes

quic-dhirajku force-pushed the vlm_mxint8_patch branch from 108866f to 03f0676 Compare April 17, 2025 07:22

quic-amitraj marked this pull request as draft April 17, 2025 12:16

quic-dhirajku force-pushed the vlm_mxint8_patch branch from e1a9044 to 068d3c6 Compare April 22, 2025 08:53

quic-dhirajku added 2 commits April 22, 2025 08:54

Rebased to update transformers version and addressed comments to edit…

dc3b639

… out older qnn based changes. Signed-off-by: quic-dhirajku <quic_dhirajku@quicinc.com> Signed-off-by: Dhiraj Kumar Sah <quic_dhirajku@quicinc.com>

quic-dhirajku force-pushed the vlm_mxint8_patch branch from 068d3c6 to dc3b639 Compare April 22, 2025 08:54

quic-dhirajku and others added 2 commits April 22, 2025 08:57

Formatting issue resolved

cbb15c9

Signed-off-by: Dhiraj Kumar Sah <quic_dhirajku@quicinc.com>

Merge branch 'main' into vlm_mxint8_patch

0029aac

quic-rishinr marked this pull request as ready for review April 22, 2025 15:54

quic-hemagnih approved these changes Apr 23, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added changes to ensure mxint8 compilations of VLMs work. #336

Added changes to ensure mxint8 compilations of VLMs work. #336

quic-dhirajku commented Apr 2, 2025

quic-hemagnih Apr 15, 2025

quic-hemagnih Apr 15, 2025

quic-xiyushi commented Apr 23, 2025

Added changes to ensure mxint8 compilations of VLMs work. #336

Are you sure you want to change the base?

Added changes to ensure mxint8 compilations of VLMs work. #336

Conversation

quic-dhirajku commented Apr 2, 2025

quic-hemagnih Apr 15, 2025

Choose a reason for hiding this comment

quic-hemagnih Apr 15, 2025

Choose a reason for hiding this comment

quic-xiyushi commented Apr 23, 2025