-
Notifications
You must be signed in to change notification settings - Fork 43
Mllama(single + dual) + InternVL(single) + Llava (single) #267
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> Co-authored-by: Amit Raj <quic_amitraj@quicinc.com> Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
1. Mllama single qpc support added 2. Simplified generate inputs for single and dual qpc --------- Signed-off-by: Amit Raj <quic_amitraj@quicinc.com> Co-authored-by: asmigosw <asmigosw@qti.qualcomm.com> Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Added support for Laava model single QPC Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
b9c0bc1
to
b3a5d22
Compare
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
|
||
class QEFFTransformersBase(QEFFBaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this entire part showing up as diff? possible to clean up the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we moved the QEFFAutoModelForCausalLM
class to the bottom of the file from top as we wanted to support Intern via that.
As we discussed should I chnage it to use QEFFAutoModelImageTextTOText
?
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
24fad68
to
81cea10
Compare
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we please add the description about newly added argument model
2de6982
to
f5efdae
Compare
Signed-off-by: Amit Raj <quic_amitraj@quicinc.com>
f5efdae
to
ad594d7
Compare
config = AutoConfig.from_pretrained(pretrained_model_name_or_path, trust_remote_code=True) | ||
config._attn_implementation = "eager" | ||
config.vision_config.use_flash_attn = "false" | ||
model = cls._hf_auto_class.from_pretrained(pretrained_model_name_or_path, config, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we are explicitly fetching the config and directly calling the from_pretrained() of AutoModelForImageTextToText. We could have called super.from_pretrained() and then internally it would have handled this
logger.warning("Updating low_cpu_mem_usage=False") | ||
|
||
kwargs.update({"attn_implementation": "eager", "low_cpu_mem_usage": False}) | ||
model = cls._hf_auto_class.from_pretrained(pretrained_model_name_or_path, *args, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of making these changes in derived class from_pretained() function shouldn't we make these changes in Base class and then call it from here, it might be good from design perspective
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We think we can remove QEFFTransformerBaseClass
we will need a bigger discussion to decide this.
pretrained_model_name_or_path, | ||
*args, | ||
**kwargs, | ||
): | ||
if kwargs.get("attn_implementation", None) not in {None, "eager"}: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we are not calling super.from_pretained() here?, it already has this peice of code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because we need to change config options in this method that are not needed for other auto clases.
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com>
Adding generalized infrastructure to support VLMs with Dual/single QPC approaches --------- Signed-off-by: Amit Raj <quic_amitraj@quicinc.com> Signed-off-by: Rishin Raj <quic_rishinr@quicinc.com> Signed-off-by: Onkar Chougule <quic_ochougul@quicinc.com> Co-authored-by: Rishin Raj <quic_rishinr@quicinc.com> Co-authored-by: Amit Raj <quic_amitraj@quicinc.com> Co-authored-by: Amit Raj <168538872+quic-amitraj@users.noreply.github.com> Co-authored-by: asmigosw <asmigosw@qti.qualcomm.com> Signed-off-by: Hem Agnihotri <quic_hemagnih@quicinc.com>
Adding generalized infrastructure to support VLMs with Dual/single QPC approaches