Implement avx512 masked load and store intrinsics #1254

jhorstmann · 2021-11-15T18:52:45Z

Implement avx512 masked load and store intrinsics using inline assembly.

aligned/unaligned masked/zero-masked loads
aligned/unaligned stores
avx512vl (_mm256 and _mm) variants of the above
avx512bw (byte and word) variants of the above
formatting
tests for all of the above
updated avx512f.md and avx512bw.md

The same approach also works for masked gather/scatter and compress/expand intrinsics. Probably makes sense to split these into their own PR.

rust-highfive · 2021-11-15T18:52:48Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Amanieu (or someone else) soon.

Please see the contribution instructions for more information.

Amanieu · 2021-11-16T04:04:00Z

LGTM!

Just a small style nit: please indent the contents of the asm! macro, like rustfmt does for function calls.

…tore intrinsics

Amanieu · 2021-11-20T18:04:40Z

I believe in the past we avoided defining functions inside macros because it interacts poorly with our intrinsic checking tools.

jhorstmann · 2021-11-20T18:14:21Z

I believe in the past we avoided defining functions inside macros because it interacts poorly with our intrinsic checking tools.

Was just about to update the description to mention this. I saw the assert_instr tests are running, but are probably not very useful since the same parameter gets used for the assertion as in the inline asm. Tests for all functions are still on my todo list.

I think the macro approach is worth it since it reduces code for the load intrinsics by about 30x and reduces chances of copy-paste mistakes.

Amanieu · 2021-11-20T18:43:11Z

Specifically the stdarch_verify crate will parse every .rs file to find the names and signatures of all intrinsic functions. This parsing does not perform macro expansion (that would require running the full rust compiler).

I think it would be better to avoid using macros for now. The ARM code avoids this issue by using a code generator, but it is probably not worth the effort in this case since AVX512 is mostly complete already.

Amanieu · 2021-11-21T21:08:01Z

Could you also mark the intrinsics as implemented in crates/core_arch/avx512f.md. We should be able to start stabilizing avx512 once it is complete.

jhorstmann · 2021-11-28T23:03:09Z

Should be ready for review now. The github diff view looks confusing, individual commits might be clearer.

I ended up using a "poor man's" code generator by expanding the macros from the earlier commit and postprocessing the output with some small regular expressions. It's a bit manual and probably not worth checking in. More time was spent in writing all the tests.

The avx512vl functions required adding avx to target_feature in order to use ymm registers. It seems the CI run for i586 also required the sse feature to use xmm registers. I could not reproduce that locally. Would be nice if avx512vl already allowed using those registers.

Amanieu · 2021-12-01T17:10:35Z

LGTM! I'm just waiting on rust-lang/rust#91381 which is causing the Android CI to fail.

luojia65 · 2021-12-09T03:45:20Z

Did this commit break rollup merge? :)
Ref: rust-lang/rust#91548 (comment)

luojia65 · 2021-12-09T03:48:40Z

crates/core_arch/src/x86/avx512f.rs

+    let mut dst: __m512i = src;
+    asm!(
+         "vmovdqu32 {2}{{{1}}}, [{0}]",
+         in(reg) mem_addr,


Compiling std v0.0.0 (/checkout/library/std) error: formatting may not be suitable for sub-register argument --> library/core/src/../../stdarch/crates/core_arch/src/x86/avx512f.rs:30336:34 | 30336 | "vmovdqu32 {2}{{{1}}}, [{0}]", | ^^^ 30337 | in(reg) mem_addr, | -------- for this argument | = note: `-D asm-sub-register` implied by `-D warnings` = help: use the `e` modifier to have the register formatted as `eax` = help: or use the `r` modifier to keep the default formatting of `rax`

(from rollup CI Result)

Amanieu · 2021-12-09T03:49:05Z

Yes. The issue is that x32 (x86_64 with 32-bit pointers) the address operand is inserted into the asm as rax instead of eax. The fix is to use the :e modifier on x86 and x32 (but not x86_64). Have a look at bt.rs for a similar case.

jhorstmann · 2021-12-09T09:51:31Z

Oh, my bad. I'll keep this in mind when I start working on remaining intrinsics.

luojia65 · 2021-12-09T10:11:59Z

@jhorstmann I submitted fix at: #1264

Implement avx512f masked unaligned load and store intrinsics

6f6d59b

rust-highfive assigned Amanieu Nov 15, 2021

Reduce code repetition using macros and implement avx512vl load and s…

c902d90

…tore intrinsics

Not using macros, adding more tests

7fe1b6b

jhorstmann added 8 commits November 23, 2021 23:20

Tests for mm512 aligned stores

b9329bd

Tests for 256-bit variants

ce1f051

Change tests to store into slices

5f028ad

Tests for 128-bit variants

c3b7347

Update avx512f checklist

2c22e09

Add avx512bw masked load and stores

1f7d501

Tests for avx512bw masked loads and stores

7d28014

Using xmm registers seems to require sse target_feature on CI

53f193b

jhorstmann marked this pull request as ready for review November 28, 2021 23:03

Amanieu merged commit 59df818 into rust-lang:master Dec 4, 2021

luojia65 reviewed Dec 9, 2021

View reviewed changes

Amanieu mentioned this pull request Jan 24, 2022

Implement avx512 compressstore intrinsics #1273

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement avx512 masked load and store intrinsics #1254

Implement avx512 masked load and store intrinsics #1254

Uh oh!

jhorstmann commented Nov 15, 2021 •

edited

Loading

Uh oh!

rust-highfive commented Nov 15, 2021

Uh oh!

Amanieu commented Nov 16, 2021

Uh oh!

Amanieu commented Nov 20, 2021

Uh oh!

jhorstmann commented Nov 20, 2021

Uh oh!

Amanieu commented Nov 20, 2021

Uh oh!

Amanieu commented Nov 21, 2021

Uh oh!

jhorstmann commented Nov 28, 2021

Uh oh!

Amanieu commented Dec 1, 2021

Uh oh!

luojia65 commented Dec 9, 2021

Uh oh!

luojia65 Dec 9, 2021

Uh oh!

Amanieu commented Dec 9, 2021

Uh oh!

jhorstmann commented Dec 9, 2021

Uh oh!

luojia65 commented Dec 9, 2021

Uh oh!

Uh oh!

Implement avx512 masked load and store intrinsics #1254

Implement avx512 masked load and store intrinsics #1254

Uh oh!

Conversation

jhorstmann commented Nov 15, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rust-highfive commented Nov 15, 2021

Uh oh!

Amanieu commented Nov 16, 2021

Uh oh!

Amanieu commented Nov 20, 2021

Uh oh!

jhorstmann commented Nov 20, 2021

Uh oh!

Amanieu commented Nov 20, 2021

Uh oh!

Amanieu commented Nov 21, 2021

Uh oh!

jhorstmann commented Nov 28, 2021

Uh oh!

Amanieu commented Dec 1, 2021

Uh oh!

luojia65 commented Dec 9, 2021

Uh oh!

luojia65 Dec 9, 2021

Choose a reason for hiding this comment

Uh oh!

Amanieu commented Dec 9, 2021

Uh oh!

jhorstmann commented Dec 9, 2021

Uh oh!

luojia65 commented Dec 9, 2021

Uh oh!

Uh oh!

jhorstmann commented Nov 15, 2021 •

edited

Loading