Return log probabilities for tokens #238

yunfeng-scale · 2023-11-01T18:07:35Z

Hi, we'd like to have log probabilities for tokens returned from the model, in addition to the token ids. Can you help with this feature request?

byshiue · 2023-11-02T01:14:23Z

Here is a flag to output log probs of generated tokens https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L280. The log probs of input tokens are not supported now, it will be supported in near future.

yunfeng-scale · 2023-11-02T16:54:47Z

@byshiue thanks. Possible to include in the C++ implementation too? https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/include/tensorrt_llm/batch_manager/callbacks.h#L32 currently inflight batching is supported through C++ runtime

yunfeng-scale mentioned this issue Nov 1, 2023

Integrate TensorRT-LLM scaleapi/llm-engine#358

Merged

byshiue self-assigned this Nov 2, 2023

byshiue closed this as completed Nov 2, 2023

byshiue added the good first issue label Nov 2, 2023

jdemouth-nvidia added the feature request New feature or request label Nov 2, 2023

ncomly-nvidia mentioned this issue Dec 11, 2023

TensorRT-LLM Requests #632

Open

41 tasks

sindhuvahinis mentioned this issue May 13, 2024

Question: Do we support input log probabilties with C++ inflight backend? #1593

Closed

nv-guomingz removed the good first issue label Dec 4, 2024

Provide feedback