-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Gemm update #1518
Gemm update #1518
Conversation
Here's some benchmarks (on M2 Ultra) - the numbers are around the same and within the range of variation between runs
|
066a9e6
to
a5bfec9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Added some tuning updates
|
@@ -88,6 +88,83 @@ inline auto collapse_batches(const array& a, const array& b, const array& c) { | |||
// Steel matmul fallback | |||
/////////////////////////////////////////////////////////////////////////////// | |||
|
|||
#define GEMM_TPARAM_MACRO(devc) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️
Proposed changes
Just some re-structuring to the steel primitives. First of many updates to come.
No notable regression in performance was seen on M2 Ultra or M3 Max, some slight improvements in certain shapes
Checklist
Put an
x
in the boxes that apply.pre-commit run --all-files
to format my code / installed pre-commit prior to committing changes