Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

changelog : llama-server REST API #9291

Open
ggerganov opened this issue Sep 3, 2024 · 14 comments
Open

changelog : llama-server REST API #9291

ggerganov opened this issue Sep 3, 2024 · 14 comments
Labels
documentation Improvements or additions to documentation roadmap Part of a roadmap project

Comments

@ggerganov
Copy link
Member

ggerganov commented Sep 3, 2024

Overview

This is a list of changes to the public HTTP interface of the llama-server example. Collaborators are encouraged to edit this post in order to reflect important changes to the API that end up merged into the master branch.

If you are building a 3rd party project that relies on llama-server, it is recommended to follow this issue and check it carefully before upgrading to new versions.

See also:

Recent API changes (most recent at the top)

version PR desc
b4599 #9639 /v1/chat/completions now supports tools & tool_choice
TBD. #10974 /v1/completions is now OAI-compat
TBD. #10783 logprobs is now OAI-compat, default to pre-sampling probs
TBD. #10861 /embeddings supports pooling type none
TBD. #10853 Add optional "tokens" output to /completions endpoint
b4337 #10803 Remove penalize_nl
b4265 #10626 CPU docker images working directory changed to /app
b4285 #10691 (Again) Change /slots and /props responses
b4283 #10704 Change /slots and /props responses
b4027 #10162 /slots endpoint: remove slot[i].state, add slot[i].is_processing
b3912 #9865 Add option to time limit the generation phase
b3911 #9860 Remove self-extend support
b3910 #9857 Remove legacy system prompt support
b3897 #9776 Change default security settings, /slots is now disabled by default
Endpoints now check for API key if it's set
b3887 #9510 Add /rerank endpoint
b3754 #9459 Add [DONE]\n\n in OAI stream response to match spec
b3721 #9398 Add seed_cur to completion response
b3683 #9308 Environment variable updated
b3599 #9056 Change /health and /slots

For older changes, use:

git log --oneline -p b3599 -- examples/server/README.md

Upcoming API changes

  • TBD
@ggerganov ggerganov added the documentation Improvements or additions to documentation label Sep 3, 2024
@ggerganov ggerganov pinned this issue Sep 3, 2024
@ngxson
Copy link
Collaborator

ngxson commented Sep 7, 2024

Not a REST API breaking change, but is server-related: some environment variables are changed in #9308

@slaren
Copy link
Member

slaren commented Sep 13, 2024

After #9398, in the completion response seed contains the seed requested by the user, while seed_cur contains the seed used to generate the completion. The values can be different if seed is LLAMA_DEFAULT_SEED (or -1), in which case a random seed is generated and returned in seed_cur.

@ngxson
Copy link
Collaborator

ngxson commented Oct 8, 2024

Breaking change #9776 : better security control for public deployments

  • /slots endpoint is now disabled by default, start server with --slots to enable it
  • If an API key is set, all endpoints (including /slots and /props) requires a correct API key to access.
    Note: Only /health and /models are always publicly accessible
  • Setting "system_prompt" is removed from /completions endpoint. It is now moved to POST /props (see documentation)

Please note that GET /props is always enabled to avoid breaking the web UI.

@ngxson
Copy link
Collaborator

ngxson commented Nov 4, 2024

Breaking change for /slots endpoint #10162

slot[i].state is removed and replaced by slot[i].is_processing

slot[i].is_processing === false means the slot is idle

@isaac-mcfadyen
Copy link
Contributor

Breaking change for /slots endpoint #10162

slot[i].state is removed and replaced by slot[i].is_processing

slot[i].is_processing === false means the slot is idle

Was the slots endpoint also disabled by default? (or maybe just a documentation change?)
https://github.com/ggerganov/llama.cpp/pull/10162/files#diff-42ce5869652f266b01a5b5bc95f4d945db304ce54545e2d0c017886a7f1cee1aR698

@ngxson
Copy link
Collaborator

ngxson commented Nov 5, 2024

For security reasons, "/slots" was disabled by default since #9776 , and was mentioned in the breaking changes table. I just forgot to update the docs.

@ngxson
Copy link
Collaborator

ngxson commented Nov 7, 2024

Not an API change, but maybe good to know that the default web UI for llama-server changed in #10175

If you want to use the old completion UI, please follow instruction in the PR.

@ggerganov
Copy link
Member Author

cache_prompt: true is now used by default (#10501)

@ngxson
Copy link
Collaborator

ngxson commented Dec 7, 2024

/propsand /slots endpoints has changed in #10691 and #10704 , see server/README.md for more

@ngxson
Copy link
Collaborator

ngxson commented Dec 18, 2024

/embeddings will NOT be OAI-compat after #10861

For clarification, we will maintain OAI-compat for all API under /v1 prefix, including:

  • /v1/embeddings
  • /v1/chat/completions

NOTE: OAI support for /v1/completions will come in the near future

@ngxson
Copy link
Collaborator

ngxson commented Dec 19, 2024

Behavior of n_probs has changed in #10783 , we're now providing OAI-compatible logprobs option

@ngxson
Copy link
Collaborator

ngxson commented Dec 31, 2024

Added OAI-compat support for /v1/completions here: #10974

If you want to use it with downstream library, be sure to add /v1 prefix. For example, using python library:

from openai import OpenAI

client = OpenAI(api_key="dummy", base_url=f"http://localhost:8080/v1")
res = client.completions.create(
    model="davinci-002",
    prompt="I believe the meaning of life is",
    max_tokens=8,
)

If you want to use the old non-OAI style, remove the /v1 from endpoint path.

Copy link
Contributor

This issue was closed because it has been inactive for 14 days since being marked as stale.

@isaac-mcfadyen
Copy link
Contributor

Guessing we want to keep this open.

@ggerganov ggerganov reopened this Mar 17, 2025
@ggerganov ggerganov added roadmap Part of a roadmap project and removed stale labels Mar 17, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Improvements or additions to documentation roadmap Part of a roadmap project
Projects
None yet
Development

No branches or pull requests

5 participants
@ggerganov @slaren @isaac-mcfadyen @ngxson and others