Releases
v0.19.0
Highlights
Speed improvements
Up to 6x faster CPU indexing benchmarks
Faster Metal compiled kernels for strided inputs benchmarks
Faster generation with fused-attention kernel benchmarks
Gradient for grouped convolutions
Due to Python 3.8's end-of-life we no longer test with it on CI
Core
New features
Gradient for grouped convolutions
mx.roll
mx.random.permutation
mx.real
and mx.imag
Performance
Up to 6x faster CPU indexing benchmarks
Faster CPU sort benchmarks
Faster Metal compiled kernels for strided inputs benchmarks
Faster generation with fused-attention kernel benchmarks
Bulk eval in safetensors to avoid unnecessary serialization of work
Misc
Bump to nanobind 2.2
Move testing to python 3.9 due to 3.8's end-of-life
Make the GPU device more thread safe
Fix the submodule stubs for better IDE support
CI generated docs that will never be stale
NN
Add support for grouped 1D convolutions to the nn API
Add some missing type annotations
Bugfixes
Fix and speedup row-reduce with few rows
Fix normalization primitive segfault with unexpected inputs
Fix complex power on the GPU
Fix freeing deep unevaluated graphs details
Fix race with array::is_available
Consistently handle softmax with all -inf
inputs
Fix streams in affine quantize
Fix CPU compile preamble for some linux machines
Stream safety in CPU compilation
Fix CPU compile segfault at program shutdown
You can’t perform that action at this time.