v0.20.0

barronalex released this 05 Nov 21:23

· 103 commits to main since this release

726dbd9

Highlights

Even faster GEMMs
- Peaking at 23.89 TFlops on M2 Ultra benchmarks
BFS graph optimizations
- Over 120tks with Mistral 7B!
Fast batched QMV/QVM for KV quantized attention benchmarks

Core

New Features
- mx.linalg.eigh and mx.linalg.eigvalsh
- mx.nn.init.sparse
- 64bit type support for mx.cumprod, mx.cumsum
Performance
- Faster long column reductions
- Wired buffer support for large models
- Better Winograd dispatch condition for convs
- Faster scatter/gather
- Faster mx.random.uniform and mx.random.bernoulli
- Better threadgroup sizes for large arrays
Misc
- Added Python 3.13 to CI
- C++20 compatibility

Bugfixes

Fix command encoder synchronization
Fix mx.vmap with gather and constant outputs
Fix fused sdpa with differing key and value strides
Support mx.array.__format__ with spec
Fix multi output array leak
Fix RMSNorm weight mismatch error

Assets 2