-
-
Notifications
You must be signed in to change notification settings - Fork 6.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
[Bugfix] Fix Positive Feature Layers in Llava Models #13514
[Bugfix] Fix Positive Feature Layers in Llava Models #13514
Conversation
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
Signed-off-by: Alex-Brooks <Alex.brooks@ibm.com>
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for fixing!
@@ -438,7 +438,7 @@ def _get_layer_index(feature_layer_index: int, num_hidden_layers: int) -> int: | |||
""" | |||
if feature_layer_index < 0: | |||
return num_hidden_layers + feature_layer_index + 1 | |||
return feature_layer_index + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the cause for the extra block being loaded, e.g., given num_hidden_layers=24
, feature_layer_index=-1
produces 24
, but feature_layer_index=24
produces 25
Awesome, thanks for the quick review Cyrus! 😄 |
This PR fixes two small issues with positive feature layers in llava models.
There is an off by one indexing inconsistency compared to
transformers
when there are multiple feature layers used (e.g., IBM's granite vision models). E.g., if you consider a model with 24 visual encoder blocks:[hs0, hs1, ..., hs23]
transformers
, the input to the visual encoder is also part of thehidden_states
, i.e.,[input_embed, hs0, hs1, ..., hs23]
- Example in the Clip visual encoder can be found hereOne extra layer is loaded - When the vision feature layer is positive, the layer count is one higher than the corresponding negative index, e.g., given a model with
24
visual encoder blocks, feature layer-1
produces the same output as23
, but this should be24
since0
should correspond to the input embedding. This also breaks if you request24
since it tries to load one more layer than it shouldNote that most llava models use one feature layer that's typically negative, so generally don't hit these cases