enable LoRA for embedding models #821

skaulintel · 2025-02-12T21:54:20Z

enable lora for text embedding models. Depends on 758

tested for

intfloat/e5-mistral-7b-instruct
meta-llama/Llama-2-7b-hf

…such as bert,roberta, bart.

….inf caused nan for softmax calculation.

…pool

…ith vllm-hpu-extension change

afierka-intel · 2025-02-20T14:22:21Z

@skaulintel fix the pre-commit issue: https://github.com/HabanaAI/vllm-fork/actions/runs/13419658543/job/37488981187?pr=821 please. Then rebase on habana-main to fix two fails in jenkins test.

Can you also explain, link changes in requirements-hpu? Why actually you change hash of vllm-hpu-extension? Is is neccessary or is it a development artifact?

Thank you!

michalkuligowski · 2025-02-27T09:22:52Z

vllm/worker/hpu_model_runner.py

@@ -1588,6 +1588,76 @@ def prepare_input_tensors(
                                     lora_ids=lora_ids), \
                                        sampling_metadata

+    def create_lora_mask(self, input_tokens: torch.Tensor, lora_ids: List[int],


There are still yapf errors in precommit, please fix

libinta and others added 30 commits January 23, 2025 23:50

Initial draft to enable embedding task.

56b42c3

remove ENCODER_ONLY

b62d611

Added support for embedding model with self attention without causal …

a647baa

…such as bert,roberta, bart.

Change set_attn_bias padding element from -math.inf to -3e38 as -math…

46e1aad

….inf caused nan for softmax calculation.

rewrite is_causal and add dbg msg

2f74e6b

update maskoff value

99947c8

fix wrong base mask

094294c

cleanup code

1c7416f

cleanup code

c6cdae1

cleanup code

8ac281b

Add pooler support for padded batch inputs for hpu with CLSPoll, Last…

e72c2f0

…pool

add meanpool for padded input

7c1c74b

revert bert change

5c49ca1

modify meanpool for padded input

ae6fbe0

write is_pooler function

d65340a

fix is_causal logic

0c28519

Set is_causal based on attn_type

1fe398f

Set is_causal based on attn_type

c3a92f3

enable lora embedding models on hpu

afe8bb3

fix with warmup issue

55ae676

fix cpu test issue and format

787700b

fix code format

6f02b86

Merge branch 'habana_main' into dev/enable_embedding_ace

b97f7c6

fix hpu attn coding issue

593ded0

fix hpu_pooling_model_runner.py code format and add requirement-hpu w…

30f43b5

…ith vllm-hpu-extension change

Merge branch 'dev/enable_embedding_ace' into dev/skaul_enable_lora_embed

7dc5239

move create lora mask

c636da7

add support for batch padding

1185c2e

Merge branch 'dev/enable_embedding_ace' into dev/skaul_enable_lora_embed

82a6e70

Merge branch 'habana_main' into dev/enable_embedding_ace

53f94e0

Merge branch 'dev/enable_embedding_ace' into dev/skaul_enable_lora_embed

05ecf57

skaulintel requested review from kzawora-intel, madamczykhabana, michalkuligowski, mgawarkiewicz, vivekgoe and afierka-intel as code owners February 12, 2025 21:54

Merge branch 'habana_main' into dev/skaul_enable_lora_embed

8d8f1b2

skaulintel added 7 commits February 20, 2025 14:41

Update requirements-hpu.txt

3dd63db

Merge branch 'habana_main' into dev/skaul_enable_lora_embed

43ae76f

restore requirements-hpu

7bba2f3

remove intermediate tensor

55695e3

Update hpu_pooling_model_runner.py

ed9b4b2

add back intermediate tensor

e673fbe

Merge branch 'habana_main' into dev/skaul_enable_lora_embed

665be55

michalkuligowski requested changes Feb 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable LoRA for embedding models #821

enable LoRA for embedding models #821

skaulintel commented Feb 12, 2025 •

edited by github-actions bot

Loading

afierka-intel commented Feb 20, 2025 •

edited

Loading

michalkuligowski Feb 27, 2025

enable LoRA for embedding models #821

Are you sure you want to change the base?

enable LoRA for embedding models #821

Conversation

skaulintel commented Feb 12, 2025 • edited by github-actions bot Loading

afierka-intel commented Feb 20, 2025 • edited Loading

michalkuligowski Feb 27, 2025

Choose a reason for hiding this comment

skaulintel commented Feb 12, 2025 •

edited by github-actions bot

Loading

afierka-intel commented Feb 20, 2025 •

edited

Loading