v0.0.9

github-actions released this 12 Jul 05:54

0.0.9 (2024-07-12)

Bugfix

fix the decode kernel segfault in cudagraph mode (#368)(c69cfa)

fix decode kernels output for empty kv cache (#363)(ac72b1)
check gpu id in PyTorch APIs and use input tensor's gpu default stream (#361)(1b84fa)

Performance Improvements

accelerate alibi (#365) (4f0a9f9)
accelerate gqa performance (#356) (e56ddad)
Optimize tensor conversions in C++ code to avoid unnecessary copies (#366) (1116237)

Acknowledgement

We thank @Yard1, @Ying1123 and @zhyncs for their contributions.

Assets 27