server : add lora hotswap endpoint #8857

ngxson · 2024-08-04T17:52:19Z

TODO:

Update docs
Add tests

New argument: --lora-init-without-apply

If --lora-init-without-apply is specified, lora adapter will be loaded but not being apply with llama_init_from_gpt_params.

User can apply it later with the POST /lora-adapters endpoint below

New endpoints

GET /lora-adapters

Get list of all adapters. If an adapter is disabled, the scale will be set to 0.

Response:

[
    {
        "id": 0,
        "path": "my_adapter_1.gguf",
        "scale": 0.0
    },
    {
        "id": 1,
        "path": "my_adapter_2.gguf",
        "scale": 0.0
    }
]

POST /lora-adapters

Set list of adapters. To disable an adapter, either remove it from the list below, or set scale to 0.

Request:

[
  {"id": 0, "scale": 0.2},
  {"id": 1, "scale": 0.8}
]

Response:

{ "success": true }

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

ngxson · 2024-08-04T18:10:01Z

self note: maybe wait for changes from #8823 and add the list of loaded lora to struct

Green-Sky · 2024-08-04T18:26:50Z

--lora-no-apply sounds kind of contrived, maybe --lora-available or similar is better.

ngxson · 2024-08-04T18:37:28Z

I don't get what you mean. The option means "load the adapter to memory, but do not apply it right away"

probably something like --lora-apply-later or --lora-init-without-apply is more stupidly simple to understand?

ngxson · 2024-08-06T11:46:33Z

@ggerganov I added test and docs to this PR, plus adapt to change from #8823

Could you re-review this? Thank you.

* server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style

ngxson added 2 commits August 4, 2024 19:46

server : add lora hotswap endpoint

e5c2d16

handle lora_no_apply

42960ec

github-actions bot added examples server labels Aug 4, 2024

fix build

e7d9abf

ggerganov approved these changes Aug 5, 2024

View reviewed changes

mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Aug 5, 2024

ngxson mentioned this pull request Aug 5, 2024

Add lightweight tests for LoRA #8708

Closed

ngxson added 4 commits August 5, 2024 23:00

updae docs

aa3cea0

Merge branch 'master' into xsn/lora_server_hotswap

21cb133

clean up struct def

c58a332

fix build

e91c578

github-actions bot added the python python script changes label Aug 6, 2024

add LoRA test

98da78b

ngxson marked this pull request as ready for review August 6, 2024 11:45

ngxson requested a review from ggerganov August 6, 2024 11:45

ggerganov approved these changes Aug 6, 2024

View reviewed changes

fix style

f01bc37

ngxson merged commit 1e6f655 into ggml-org:master Aug 6, 2024
54 checks passed

ltoniazzi mentioned this pull request Aug 17, 2024

Add lora test workflow (WIP) #9058

Closed

7 tasks

ngxson changed the title ~~server : add lora hotswap endpoint (WIP)~~ server : add lora hotswap endpoint Aug 18, 2024

This was referenced Aug 19, 2024

Hot-swap LoRA with updated llama.cpp undreamai/LLMUnity#212

Closed

Release v2.2.0 undreamai/LLMUnity#220

Merged

michaellin99999 mentioned this pull request Nov 18, 2024

LORA Adapter Hot Swap Implementation Problem #10374

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server : add lora hotswap endpoint #8857

server : add lora hotswap endpoint #8857

ngxson commented Aug 4, 2024 •

edited

Loading

ngxson commented Aug 4, 2024

Green-Sky commented Aug 4, 2024

ngxson commented Aug 4, 2024 •

edited

Loading

ngxson commented Aug 6, 2024 •

edited

Loading

server : add lora hotswap endpoint #8857

server : add lora hotswap endpoint #8857

Conversation

ngxson commented Aug 4, 2024 • edited Loading

New argument: --lora-init-without-apply

New endpoints

ngxson commented Aug 4, 2024

Green-Sky commented Aug 4, 2024

ngxson commented Aug 4, 2024 • edited Loading

ngxson commented Aug 6, 2024 • edited Loading

ngxson commented Aug 4, 2024 •

edited

Loading

ngxson commented Aug 4, 2024 •

edited

Loading

ngxson commented Aug 6, 2024 •

edited

Loading