Skip to content

v0.18.0

Compare
Choose a tag to compare
@awni awni released this 27 Sep 21:10
· 172 commits to main since this release
b1e2b53

Highlights

  • Speed improvements:
    • Up to 2x faster I/O: benchmarks.
    • Faster transposed copies, unary, and binary ops
  • Transposed convolutions
  • Improvements to mx.distributed (send/recv/average_gradients)

Core

  • New features:

    • mx.conv_transpose{1,2,3}d
    • Allow mx.take to work with integer index
    • Add std as method on mx.array
    • mx.put_along_axis
    • mx.cross_product
    • int() and float() work on scalar mx.array
    • Add optional headers to mx.fast.metal_kernel
    • mx.distributed.send and mx.distributed.recv
    • mx.linalg.pinv
  • Performance

    • Up to 2x faster I/O
    • Much faster CPU convolutions
    • Faster general n-dimensional copies, unary, and binary ops for both CPU and GPU
    • Put reduction ops in default stream with async for faster comms
    • Overhead reductions in mx.fast.metal_kernel
    • Improve donation heuristics to reduce memory use
  • Misc

    • Support Xcode 160

NN

  • Faster RNN layers
  • nn.ConvTranspose{1,2,3}d
  • mlx.nn.average_gradients data parallel helper for distributed training

Bug Fixes

  • Fix boolean all reduce bug
  • Fix extension metal library finding
  • Fix ternary for large arrays
  • Make eval just wait if all arrays are scheduled
  • Fix CPU softmax by removing redundant coefficient in neon_fast_exp
  • Fix JIT reductions
  • Fix overflow in quantize/dequantize
  • Fix compile with byte sized constants
  • Fix copy in the sort primitive
  • Fix reduce edge case
  • Fix slice data size
  • Throw for certain cases of non captured inputs in compile
  • Fix copying scalars by adding fill_gpu
  • Fix bug in module attribute set, reset, set
  • Ensure io/comm streams are active before eval
  • Fix mx.clip
  • Override class function in Repr so mx.array is not confused with array.array
  • Avoid using find_library to make install truly portable
  • Remove fmt dependencies from MLX install
  • Fix for partition VJP
  • Avoid command buffer timeout for IO on large arrays