Skip to content

Mllama(single + dual) + InternVL(single) + Llava (single) #267

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 28 commits into from
Feb 14, 2025

Conversation

ochougul
Copy link
Contributor

@ochougul ochougul commented Feb 10, 2025

Adding generalized infrastructure to support VLMs with Dual/single QPC approaches

quic-rishinr and others added 15 commits February 10, 2025 21:09
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com>
Co-authored-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
1. Mllama single qpc support added
2. Simplified generate inputs for single and dual qpc

---------

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Co-authored-by: asmigosw <asmigosw@qti.qualcomm.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Added support for Laava model single QPC

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
@quic-amitraj quic-amitraj force-pushed the mllama_single_dual_qpc branch from b9c0bc1 to b3a5d22 Compare February 10, 2025 21:10
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>

class QEFFTransformersBase(QEFFBaseModel):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this entire part showing up as diff? possible to clean up the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we moved the QEFFAutoModelForCausalLM class to the bottom of the file from top as we wanted to support Intern via that.

As we discussed should I chnage it to use QEFFAutoModelImageTextTOText?

quic-amitraj and others added 3 commits February 11, 2025 12:05
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
@quic-amitraj quic-amitraj force-pushed the mllama_single_dual_qpc branch from 24fad68 to 81cea10 Compare February 13, 2025 09:09
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Copy link
Contributor

@quic-hemagnih quic-hemagnih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we please add the description about newly added argument model

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
@quic-amitraj quic-amitraj force-pushed the mllama_single_dual_qpc branch from f5efdae to ad594d7 Compare February 13, 2025 11:49
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, trust_remote_code=True)
config._attn_implementation = "eager"
config.vision_config.use_flash_attn = "false"
model = cls._hf_auto_class.from_pretrained(pretrained_model_name_or_path, config, *args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we are explicitly fetching the config and directly calling the from_pretrained() of AutoModelForImageTextToText. We could have called super.from_pretrained() and then internally it would have handled this

logger.warning("Updating low_cpu_mem_usage=False")

kwargs.update({"attn_implementation": "eager", "low_cpu_mem_usage": False})
model = cls._hf_auto_class.from_pretrained(pretrained_model_name_or_path, *args, **kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of making these changes in derived class from_pretained() function shouldn't we make these changes in Base class and then call it from here, it might be good from design perspective

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We think we can remove QEFFTransformerBaseClass we will need a bigger discussion to decide this.

pretrained_model_name_or_path,
*args,
**kwargs,
):
if kwargs.get("attn_implementation", None) not in {None, "eager"}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why we are not calling super.from_pretained() here?, it already has this peice of code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because we need to change config options in this method that are not needed for other auto clases.

Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
@ochougul ochougul merged commit d0ee7bc into main Feb 14, 2025
4 checks passed
quic-hemagnih pushed a commit to quic-hemagnih/efficient-transformers that referenced this pull request Mar 12, 2025
Adding generalized infrastructure to support VLMs with Dual/single QPC
approaches

---------

Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Co-authored-by: Rishin Raj <quic_rishinr@quicinc.com>
Co-authored-by: Amit Raj <quic_amitraj@quicinc.com>
Co-authored-by: Amit Raj <168538872+quic-amitraj@users.noreply.github.com>
Co-authored-by: asmigosw <asmigosw@qti.qualcomm.com>
Signed-off-by: Hem Agnihotri <quic_hemagnih@quicinc.com>
@quic-rishinr quic-rishinr deleted the mllama_single_dual_qpc branch March 25, 2025 06:16
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants