-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Redundant branches with ctlz and cttz #47467
Comments
We expand the intrinsics in -codegenprepare, and I'm not sure where we would solve this. machine-cse seems like the most likely candidate, but it would require tracking eflags state across basic blocks. Not sure if we do that: TEST64rr %5, %5, implicit-def $eflags
JCC_1 %bb.2, 4, implicit $eflags IR going into SDAG: define zeroext i1 @_ZN10playground20can_represent_as_f6417h8c9d47bab619cb5fE(i64 %x) unnamed_addr {
start:
%cmpz = icmp eq i64 %x, 0
br i1 %cmpz, label %cond.end, label %cond.false
cond.false: ; preds = %start
%0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
br label %cond.end
cond.end: ; preds = %start, %cond.false
%ctz = phi i64 [ 64, %start ], [ %0, %cond.false ]
%1 = trunc i64 %ctz to i32
%cmpz3 = icmp eq i64 %x, 0
br i1 %cmpz3, label %cond.end2, label %cond.false1
cond.false1: ; preds = %cond.end
%2 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
br label %cond.end2
cond.end2: ; preds = %cond.end, %cond.false1
%ctz4 = phi i64 [ 64, %cond.end ], [ %2, %cond.false1 ]
%3 = trunc i64 %ctz4 to i32
%_2 = add nuw nsw i32 %1, %3
%4 = icmp ugt i32 %_2, 10
ret i1 %4
} |
Note that this should not be an issue when compiling for more recent x86:
|
After #102885 the branches are now avoided on any x86 target with cmov: playground::can_represent_as_f64::h8c9d47bab619cb5f: # @playground::can_represent_as_f64::h8c9d47bab619cb5f
bsrq %rdi, %rax
movl $127, %ecx
cmovneq %rax, %rcx
xorl $63, %ecx
bsfq %rdi, %rax
movl $64, %edx
cmovneq %rax, %rdx
addl %ecx, %edx
cmpl $11, %edx
setae %al
retq |
After #123623 x86_64 codegen will simplify to: movl $127, %eax
bsrq %rdi, %rax
xorl $63, %eax
movl $64, %ecx
rep bsfq %rdi, %rcx
addl %eax, %ecx
cmpl $11, %ecx
setae %al
retq |
Resolving this - on any target with CMOV this is now branchless, and on x86_64 (or LZCNT/TZCNT) the codegen is pretty much optimal. |
Extended Description
Rust code:
LLVM IR:
Assembly:
Instead of performing the comparison twice, the code should immediately branch to LBB0_4.
The text was updated successfully, but these errors were encountered: