-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Support qwenvl model for HPU #793
base: habana_main
Are you sure you want to change the base?
Conversation
@michalkuligowski @jikunshang @PatrykWo could you help to review the code? |
vllm/model_executor/models/qwen.py
Outdated
inputs_embeds = merge_multimodal_embeddings( | ||
input_ids, inputs_embeds, multimodal_embeddings, | ||
self.transformer.visual.image_pad_id) | ||
batch_size, seq_length, hidden_size = inputs_embeds.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please solve merge conflicts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@michalkuligowski Merge conflicts has been solved. Please review it. Thanks
834ee00
to
705cc8f
Compare
inputs_embeds = merge_multimodal_embeddings( | ||
input_ids, inputs_embeds, multimodal_embeddings, | ||
self.transformer.visual.image_pad_id) | ||
batch_size, seq_length, hidden_size = inputs_embeds.shape |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldnt be in model definition. Please try fixing the merge_multimodal_embeddings method. You can check whether its hpu to call your implementation
This PR aims to support qwenvl vision infer on HPU.
Issue to solve
The function merge_multimodal_embeddings() in utils.py has dynamic problem on HPU.
Solution
Flatten the embeddings tensor , and use index_put_() to merge the multimodal embeddings in qwen.py instead of calling merge_multimodal_embeddings() in utils.py.
Test
Single image
python examples/offline_inference/vision_language.py -m qwen_vl
Multiple images
python examples/offline_inference/vision_language_multi_image.py -m qwen_vl_chat