-
Notifications
You must be signed in to change notification settings - Fork 289
Implement avx512 masked load and store intrinsics #1254
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Implement avx512 masked load and store intrinsics #1254
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Amanieu (or someone else) soon. Please see the contribution instructions for more information. |
LGTM! Just a small style nit: please indent the contents of the |
I believe in the past we avoided defining functions inside macros because it interacts poorly with our intrinsic checking tools. |
Was just about to update the description to mention this. I saw the I think the macro approach is worth it since it reduces code for the load intrinsics by about 30x and reduces chances of copy-paste mistakes. |
Specifically the I think it would be better to avoid using macros for now. The ARM code avoids this issue by using a code generator, but it is probably not worth the effort in this case since AVX512 is mostly complete already. |
Could you also mark the intrinsics as implemented in crates/core_arch/avx512f.md. We should be able to start stabilizing avx512 once it is complete. |
Should be ready for review now. The github diff view looks confusing, individual commits might be clearer. I ended up using a "poor man's" code generator by expanding the macros from the earlier commit and postprocessing the output with some small regular expressions. It's a bit manual and probably not worth checking in. More time was spent in writing all the tests. The avx512vl functions required adding |
LGTM! I'm just waiting on rust-lang/rust#91381 which is causing the Android CI to fail. |
Did this commit break rollup merge? :) |
let mut dst: __m512i = src; | ||
asm!( | ||
"vmovdqu32 {2}{{{1}}}, [{0}]", | ||
in(reg) mem_addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiling std v0.0.0 (/checkout/library/std)
error: formatting may not be suitable for sub-register argument
--> library/core/src/../../stdarch/crates/core_arch/src/x86/avx512f.rs:30336:34
|
30336 | "vmovdqu32 {2}{{{1}}}, [{0}]",
| ^^^
30337 | in(reg) mem_addr,
| -------- for this argument
|
= note: `-D asm-sub-register` implied by `-D warnings`
= help: use the `e` modifier to have the register formatted as `eax`
= help: or use the `r` modifier to keep the default formatting of `rax`
(from rollup CI Result)
Yes. The issue is that x32 (x86_64 with 32-bit pointers) the address operand is inserted into the asm as |
Oh, my bad. I'll keep this in mind when I start working on remaining intrinsics. |
@jhorstmann I submitted fix at: #1264 |
Implement avx512 masked load and store intrinsics using inline assembly.
The same approach also works for masked gather/scatter and compress/expand intrinsics. Probably makes sense to split these into their own PR.