Skip to content

Conversation

atbe
Copy link

@atbe atbe commented Jun 24, 2025

Log probabilities support was implemented for Ollama's OpenAI-compatible chat completion endpoints.

Key changes include:

  • Request parameters logprobs (boolean) and top_logprobs (integer) were added to openai/openai.go's ChatCompletionRequest.
  • New LogProb and TopLogProb structs were introduced in api/types.go and openai/openai.go to define the log probability response schema.
  • llm/server.go's CompletionRequest was updated to include LogProbs and TopLogProbs, and the completion struct now includes CompletionProbabilities for parsing llama.cpp responses. The Completion function now passes n_probs to the underlying llama.cpp server.
  • openai/openai.go's ChatMiddleware was modified to extract log probability settings from incoming requests and store them in the Gin context.
  • server/routes.go's ChatHandler now retrieves these settings from the context and passes them to the llm.CompletionRequest.
  • Conversion logic was added in openai/openai.go (toChatCompletion, toChunk) and server/routes.go to transform llama.cpp's completion_probabilities into the OpenAI-compatible logprobs format, including token, logprob, bytes, and top logprobs.
  • Both streaming and non-streaming responses now include log probabilities.
  • Test cases were added to openai/openai_test.go, and a LOGPROBS_IMPLEMENTATION.md file was created for documentation.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants