forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
enable LoRA for embedding models #821
Open
skaulintel
wants to merge
39
commits into
habana_main
Choose a base branch
from
dev/skaul_enable_lora_embed
base: habana_main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
39 commits
Select commit
Hold shift + click to select a range
56b42c3
Initial draft to enable embedding task.
libinta b62d611
remove ENCODER_ONLY
libinta a647baa
Added support for embedding model with self attention without causal …
libinta 46e1aad
Change set_attn_bias padding element from -math.inf to -3e38 as -math…
libinta 2f74e6b
rewrite is_causal and add dbg msg
libinta 99947c8
update maskoff value
libinta 094294c
fix wrong base mask
libinta 1c7416f
cleanup code
libinta c6cdae1
cleanup code
libinta 8ac281b
cleanup code
libinta e72c2f0
Add pooler support for padded batch inputs for hpu with CLSPoll, Last…
libinta 7c1c74b
add meanpool for padded input
libinta 5c49ca1
revert bert change
libinta ae6fbe0
modify meanpool for padded input
libinta d65340a
write is_pooler function
libinta 0c28519
fix is_causal logic
libinta 1fe398f
Set is_causal based on attn_type
libinta c3a92f3
Set is_causal based on attn_type
libinta afe8bb3
enable lora embedding models on hpu
skaulintel 55ae676
fix with warmup issue
libinta 787700b
fix cpu test issue and format
libinta 6f02b86
fix code format
libinta b97f7c6
Merge branch 'habana_main' into dev/enable_embedding_ace
libinta 593ded0
fix hpu attn coding issue
libinta 30f43b5
fix hpu_pooling_model_runner.py code format and add requirement-hpu w…
libinta 7dc5239
Merge branch 'dev/enable_embedding_ace' into dev/skaul_enable_lora_embed
skaulintel c636da7
move create lora mask
skaulintel 1185c2e
add support for batch padding
libinta 82a6e70
Merge branch 'dev/enable_embedding_ace' into dev/skaul_enable_lora_embed
skaulintel 53f94e0
Merge branch 'habana_main' into dev/enable_embedding_ace
kzawora-intel 05ecf57
Merge branch 'dev/enable_embedding_ace' into dev/skaul_enable_lora_embed
skaulintel 8d8f1b2
Merge branch 'habana_main' into dev/skaul_enable_lora_embed
skaulintel 3dd63db
Update requirements-hpu.txt
skaulintel 43ae76f
Merge branch 'habana_main' into dev/skaul_enable_lora_embed
skaulintel 7bba2f3
restore requirements-hpu
skaulintel 55695e3
remove intermediate tensor
skaulintel ed9b4b2
Update hpu_pooling_model_runner.py
skaulintel e673fbe
add back intermediate tensor
skaulintel 665be55
Merge branch 'habana_main' into dev/skaul_enable_lora_embed
skaulintel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are still yapf errors in precommit, please fix