-
Notifications
You must be signed in to change notification settings - Fork 13.3k
AVX512 code generated for i32 array sum is worse than code by clang 5 #48287
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Funny, when I change 16 to 17 in the rust code pub struct v {
val:[i32;17]
}
pub fn test(a:v, b:v) -> v {
let mut res = v { val : [0;17] };
for i in 0..17 {
res.val[i] = a.val[i] + b.val[i];
}
return res;
} I get example::test:
push rbp
mov rbp, rsp
sub rsp, 72
mov dword ptr [rbp - 8], 0
mov qword ptr [rbp - 16], 0
vmovdqu32 zmm0, zmmword ptr [rdx]
vpaddd zmm0, zmm0, zmmword ptr [rsi]
vmovdqu32 zmmword ptr [rbp - 72], zmm0
mov eax, dword ptr [rdx + 64]
add eax, dword ptr [rsi + 64]
mov dword ptr [rbp - 8], eax
mov dword ptr [rdi + 64], eax
vmovdqu ymm0, ymmword ptr [rbp - 72]
vmovdqu ymm1, ymmword ptr [rbp - 40]
vmovdqu ymmword ptr [rdi + 32], ymm1
vmovdqu ymmword ptr [rdi], ymm0
mov rax, rdi
add rsp, 72
pop rbp
ret Is this closer to the clang instructions? |
The referenced issue #48293 has a better explanation of what is happening. |
I was just about to post this issue here, good thing someone else already did. Clang only produces this "good" code for C++, not for C. On reddit people came to the conclusion that this is due to copy elision (in particular return value optimization) that is done in C++, but apparently not in C and Rust. |
In that case, #47954 might help, right? |
This no longer seems to be a problem with the latest versions of both rustc and clang: https://gcc.godbolt.org/z/c4187cno3 |
And looks like this has been the case for quite a while already, since 1.52. Worth mentioning that LLVM intentionally does not use 512-bit vectors here by default. |
Demo: https://godbolt.org/g/vqB6oj
I tried this code:
Compiled it with
rustc --crate-type=lib -C opt-level=3 -C target-cpu=skylake-avx512 --emit asm test.rs
I expected to see this happen:
Instead, this happened:
Meta
The text was updated successfully, but these errors were encountered: