Skip to content

v0.7.1

Compare
Choose a tag to compare
@github-actions github-actions released this 01 Feb 18:02
· 710 commits to main since this release
4f4d427

Highlights

This release features MLA optimization for Deepseek family of models. Compared to v0.7.0 released this Monday, we offer ~3x the generation throughput, ~10x the memory capacity for tokens, and horizontal context scalability with pipeline parallelism

V1

For the V1 architecture, we

Models

  • New Model: MiniCPM-o (text outputs only) (#12069)

Hardwares

  • Neuron: NKI-based flash-attention kernel with paged KV cache (#11277)
  • AMD: llama 3.2 support upstreaming (#12421)

Others

  • Support override generation config in engine arguments (#12409)
  • Support reasoning content in API for deepseek R1 (#12473)

What's Changed

New Contributors

Full Changelog: v0.7.0...v0.7.1