Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

server : add lora hotswap endpoint #8857

Merged
merged 9 commits into from
Aug 6, 2024

Conversation

ngxson
Copy link
Collaborator

@ngxson ngxson commented Aug 4, 2024

TODO:

  • Update docs
  • Add tests

New argument: --lora-init-without-apply

If --lora-init-without-apply is specified, lora adapter will be loaded but not being apply with llama_init_from_gpt_params.

User can apply it later with the POST /lora-adapters endpoint below

New endpoints

GET /lora-adapters

Get list of all adapters. If an adapter is disabled, the scale will be set to 0.

Response:

[
    {
        "id": 0,
        "path": "my_adapter_1.gguf",
        "scale": 0.0
    },
    {
        "id": 1,
        "path": "my_adapter_2.gguf",
        "scale": 0.0
    }
]

POST /lora-adapters

Set list of adapters. To disable an adapter, either remove it from the list below, or set scale to 0.

Request:

[
  {"id": 0, "scale": 0.2},
  {"id": 1, "scale": 0.8}
]

Response:

{ "success": true }

@ngxson
Copy link
Collaborator Author

ngxson commented Aug 4, 2024

self note: maybe wait for changes from #8823 and add the list of loaded lora to struct

@Green-Sky
Copy link
Collaborator

--lora-no-apply sounds kind of contrived, maybe --lora-available or similar is better.

@ngxson
Copy link
Collaborator Author

ngxson commented Aug 4, 2024

I don't get what you mean. The option means "load the adapter to memory, but do not apply it right away"

probably something like --lora-apply-later or --lora-init-without-apply is more stupidly simple to understand?

@mofosyne mofosyne added the Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix label Aug 5, 2024
@github-actions github-actions bot added the python python script changes label Aug 6, 2024
@ngxson ngxson marked this pull request as ready for review August 6, 2024 11:45
@ngxson ngxson requested a review from ggerganov August 6, 2024 11:45
@ngxson
Copy link
Collaborator Author

ngxson commented Aug 6, 2024

@ggerganov I added test and docs to this PR, plus adapt to change from #8823

Could you re-review this? Thank you.

@ngxson ngxson merged commit 1e6f655 into ggml-org:master Aug 6, 2024
54 checks passed
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Aug 7, 2024
* server : add lora hotswap endpoint

* handle lora_no_apply

* fix build

* updae docs

* clean up struct def

* fix build

* add LoRA test

* fix style
@ltoniazzi ltoniazzi mentioned this pull request Aug 17, 2024
7 tasks
@ngxson ngxson changed the title server : add lora hotswap endpoint (WIP) server : add lora hotswap endpoint Aug 18, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
examples python python script changes Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants