-
Notifications
You must be signed in to change notification settings - Fork 13.4k
rustc (>= 1.20.0) fails to optimize moves in trivial cases #63631
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Note that corresponding C++ code is NOT expected to optimize due to the braindead aliasing model in C++. Rust aliasing rules are more reasonable allowing more optimization opportunities, including the case presented above. In Rust, The ultimate cause of the misoptimization can still be aliasing-related given the long history of poorly optimized (or entirely broken) code for LLVM Edit: However, relying on LLVM's move optimizations is unacceptable, in my opinion, even if the root cause here was an LLVM bug. As a Rust user, I expect rustc to eliminate unnecessary moves in emitted LLVM-IR (at least in easy cases) given the importance of move semantics as a core feature of the Rust language. Another pragmatic argument is that rustc has extensive context knowledge for optimizing moves more efficiently than LLVM. TLDR: LLVM may resolve this and similar optimization issues in future, but rustc should still offer basic move optimizations because move semantics are a core feature of Rust. |
after bisecting this example I have found that it is due to the PR #42313 that the regression starts |
pub fn got() -> Vec<u32> {
let mut res = Vec::new();
let s1 = vec![1, 2, 3, 4];
res.extend_from_slice(&s1); let s1 = s1;
res.extend_from_slice(&s1); let s1 = s1;
res.extend_from_slice(&s1); let s1 = s1;
res.extend_from_slice(&s1);
res
}
pub fn expect() -> Vec<u32> {
let mut res = Vec::new();
let s1 = vec![1, 2, 3, 4];
res.extend_from_slice(&s1);
res.extend_from_slice(&s1);
res.extend_from_slice(&s1);
res.extend_from_slice(&s1);
res
}
|
I managed to create a simplified test case using only primitive types Let's start with the simplified test case based on primitive types and borrows: #[no_mangle]
pub fn got() {
let x: u64 = 0x0123456789ABCDEF;
show(&x); let x = x;
show(&x); let x = x;
show(&x);
}
#[no_mangle]
pub fn expect() {
let x: u64 = 0x0123456789ABCDEF;
show(&x);
show(&x);
show(&x);
}
#[no_mangle]
#[inline(never)]
fn show(x: &u64) { println!("(0x{:x})0x{:x}", x as *const _ as usize, x); } Optimized assembly produced by $ rustc -C opt-level=3 -Z mir-opt-level=3 --crate-type=dylib poc.rs
$ r2 -qc 's sym.got;af;afv-*;pdf;s sym.expect;af;afv-*;pdf' libpoc.so
┌ 69: sym.got ();
│ 0x000476e0 4156 push r14
│ 0x000476e2 53 push rbx
│ 0x000476e3 4883ec18 sub rsp, 0x18
│ 0x000476e7 49beefcdab8967452301 movabs r14, 0x123456789abcdef
│ 0x000476f1 4c89742408 mov qword [rsp + 8], r14
│ 0x000476f6 488b1d5b410b00 mov rbx, qword [reloc.show]
│ 0x000476fd 488d7c2408 lea rdi, [rsp + 8]
│ 0x00047702 ffd3 call rbx
│ 0x00047704 4c893424 mov qword [rsp], r14
│ 0x00047708 4889e7 mov rdi, rsp
│ 0x0004770b ffd3 call rbx
│ 0x0004770d 488b0424 mov rax, qword [rsp]
│ 0x00047711 4889442410 mov qword [rsp + 0x10], rax
│ 0x00047716 488d7c2410 lea rdi, [rsp + 0x10]
│ 0x0004771b ffd3 call rbx
│ 0x0004771d 4883c418 add rsp, 0x18
│ 0x00047721 5b pop rbx
│ 0x00047722 415e pop r14
└ 0x00047724 c3 ret
┌ 54: sym.expect ();
│ 0x00047730 4156 push r14
│ 0x00047732 53 push rbx
│ 0x00047733 50 push rax
│ 0x00047734 48b8efcdab8967452301 movabs rax, 0x123456789abcdef
│ 0x0004773e 48890424 mov qword [rsp], rax
│ 0x00047742 4c8b350f410b00 mov r14, qword [reloc.show]
│ 0x00047749 4889e3 mov rbx, rsp
│ 0x0004774c 4889df mov rdi, rbx
│ 0x0004774f 41ffd6 call r14
│ 0x00047752 4889df mov rdi, rbx
│ 0x00047755 41ffd6 call r14
│ 0x00047758 4889df mov rdi, rbx
│ 0x0004775b 41ffd6 call r14
│ 0x0004775e 4883c408 add rsp, 8
│ 0x00047762 5b pop rbx
│ 0x00047763 415e pop r14
└ 0x00047765 c3 ret Rust Playground link. Running
|
Still an issue, https://godbolt.org/z/sKWPx7bon 1.60.0-nightly (17d29dc 2022-01-21) example for u64 type got:
push rbx
sub rsp, 32
movabs rax, 81985529216486895
mov qword ptr [rsp + 8], rax
mov rbx, qword ptr [rip + show@GOTPCREL]
lea rdi, [rsp + 8]
call rbx
mov rax, qword ptr [rsp + 8]
mov qword ptr [rsp + 16], rax
lea rdi, [rsp + 16]
call rbx
mov rax, qword ptr [rsp + 16]
mov qword ptr [rsp + 24], rax
lea rdi, [rsp + 24]
call rbx
add rsp, 32
pop rbx
ret expect:
push r14
push rbx
push rax
movabs rax, 81985529216486895
mov qword ptr [rsp], rax
mov r14, qword ptr [rip + show@GOTPCREL]
mov rbx, rsp
mov rdi, rbx
call r14
mov rdi, rbx
call r14
mov rdi, rbx
call r14
add rsp, 8
pop rbx
pop r14
ret |
On the LLVM side, one of the problems here is that this does not optimize: https://llvm.godbolt.org/z/d8KP7rqKe The pointer is not captured before the call, and the pointer is readonly at the call, so this would be safe. But LLVM currently doesn't distinguish between a capture before and at the call. |
The example code below generates extra stack copies of String (meta)data in function
got()
which is expected to produce identical optimized code withexpect()
. For quickly verifying the issue, comparesub rsp, $FRAME_SIZE
instructions which initialize stack frames in the beginning ofgot()
&expect()
functions compiled with-C opt-level=3
(or measure & compare the generated code sizes). Rust Playground link.rustc versions before 1.20.0 produce expected optimized assembly.
The text was updated successfully, but these errors were encountered: