Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Update TensorRT-LLM #1122

Merged
merged 2 commits into from
Feb 21, 2024
Merged

Update TensorRT-LLM #1122

merged 2 commits into from
Feb 21, 2024

Conversation

kaiyux
Copy link
Member

@kaiyux kaiyux commented Feb 21, 2024

  • Features
    • Enable different rewind tokens per sequence for Medusa
    • OOTB functionality support
      • T5
      • Mixtral 8x7B
    • Experimental: Weightless engine support (see examples/weightless_engine/README.md)
  • API
    • Add high-level C++ API for inflight batching
    • Migrate Mixtral to high level API and unified builder workflow
  • Bug fixes
  • Benchmark/Performance
    • Optimize gptDecoderBatch to support batched sampling
    • Enable FMHA for models in BART, Whisper and NMT family
    • Add emulated static batching in gptManagerBenchmark
  • Documentation
    • Blog: Speed up inference with SOTA quantization techniques in TRT-LLM (see docs/source/blogs/quantization-in-TRT-LLM.md)

@kaiyux kaiyux merged commit eb8f26c into main Feb 21, 2024
@kaiyux kaiyux deleted the kaiyu/update branch February 21, 2024 13:31
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants