-
Notifications
You must be signed in to change notification settings - Fork 10.9k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
server : add lora hotswap endpoint #8857
Conversation
self note: maybe wait for changes from #8823 and add the list of loaded lora to struct |
|
I don't get what you mean. The option means "load the adapter to memory, but do not apply it right away" probably something like |
@ggerganov I added test and docs to this PR, plus adapt to change from #8823 Could you re-review this? Thank you. |
* server : add lora hotswap endpoint * handle lora_no_apply * fix build * updae docs * clean up struct def * fix build * add LoRA test * fix style
TODO:
New argument: --lora-init-without-apply
If
--lora-init-without-apply
is specified, lora adapter will be loaded but not being apply withllama_init_from_gpt_params
.User can apply it later with the
POST /lora-adapters
endpoint belowNew endpoints
GET
/lora-adapters
Get list of all adapters. If an adapter is disabled, the scale will be set to 0.
Response:
POST
/lora-adapters
Set list of adapters. To disable an adapter, either remove it from the list below, or set scale to 0.
Request:
Response: