We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Hi, we'd like to have log probabilities for tokens returned from the model, in addition to the token ids. Can you help with this feature request?
The text was updated successfully, but these errors were encountered:
Here is a flag to output log probs of generated tokens https://github.com/NVIDIA/TensorRT-LLM/blob/main/tensorrt_llm/runtime/generation.py#L280. The log probs of input tokens are not supported now, it will be supported in near future.
Sorry, something went wrong.
@byshiue thanks. Possible to include in the C++ implementation too? https://github.com/NVIDIA/TensorRT-LLM/blob/main/cpp/include/tensorrt_llm/batch_manager/callbacks.h#L32 currently inflight batching is supported through C++ runtime
byshiue
No branches or pull requests
Hi, we'd like to have log probabilities for tokens returned from the model, in addition to the token ids. Can you help with this feature request?
The text was updated successfully, but these errors were encountered: