Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Redundant branches with ctlz and cttz #47467

Closed
Diggsey opened this issue Nov 9, 2020 · 5 comments
Closed

Redundant branches with ctlz and cttz #47467

Diggsey opened this issue Nov 9, 2020 · 5 comments
Labels
backend:X86 bugzilla Issues migrated from bugzilla

Comments

@Diggsey
Copy link

Diggsey commented Nov 9, 2020

Bugzilla Link 48123
Version trunk
OS Windows NT
CC @topperc,@RKSimon,@phoebewang,@rotateright

Extended Description

Rust code:

pub fn can_represent_as_f64(x: u64) -> bool {
    x.leading_zeros() + x.trailing_zeros() >= 11
}

LLVM IR:

define zeroext i1 @_ZN10playground20can_represent_as_f6417h8c9d47bab619cb5fE(i64 %x) {
start:
  %0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 false)
  %1 = trunc i64 %0 to i32
  %2 = tail call i64 @llvm.cttz.i64(i64 %x, i1 false)
  %3 = trunc i64 %2 to i32
  %_2 = add nuw nsw i32 %1, %3
  %4 = icmp ugt i32 %_2, 10
  ret i1 %4
}

Assembly:

playground::can_represent_as_f64:
	mov	eax, 64
	mov	ecx, 64
	test	rdi, rdi    ; Initial test for zero and branch
	je	.LBB0_2
	bsr	rcx, rdi
	xor	rcx, 63

.LBB0_2:
	test	rdi, rdi    ; Second test for zero and branch
	je	.LBB0_4
	bsf	rax, rdi

.LBB0_4:
	add	ecx, eax
	cmp	ecx, 10
	seta	al
	ret

Instead of performing the comparison twice, the code should immediately branch to LBB0_4.

@rotateright
Copy link
Contributor

rotateright commented Nov 11, 2020

We expand the intrinsics in -codegenprepare, and I'm not sure where we would solve this. machine-cse seems like the most likely candidate, but it would require tracking eflags state across basic blocks. Not sure if we do that:

    TEST64rr %5, %5, implicit-def $eflags
    JCC_1 %bb.2, 4, implicit $eflags

IR going into SDAG:

  define zeroext i1 @_ZN10playground20can_represent_as_f6417h8c9d47bab619cb5fE(i64 %x) unnamed_addr {
  start:
    %cmpz = icmp eq i64 %x, 0
    br i1 %cmpz, label %cond.end, label %cond.false
  
  cond.false:                                       ; preds = %start
    %0 = tail call i64 @llvm.ctlz.i64(i64 %x, i1 true)
    br label %cond.end
  
  cond.end:                                         ; preds = %start, %cond.false
    %ctz = phi i64 [ 64, %start ], [ %0, %cond.false ]
    %1 = trunc i64 %ctz to i32
    %cmpz3 = icmp eq i64 %x, 0
    br i1 %cmpz3, label %cond.end2, label %cond.false1
  
  cond.false1:                                      ; preds = %cond.end
    %2 = tail call i64 @llvm.cttz.i64(i64 %x, i1 true)
    br label %cond.end2
  
  cond.end2:                                        ; preds = %cond.end, %cond.false1
    %ctz4 = phi i64 [ 64, %cond.end ], [ %2, %cond.false1 ]
    %3 = trunc i64 %ctz4 to i32
    %_2 = add nuw nsw i32 %1, %3
    %4 = icmp ugt i32 %_2, 10
    ret i1 %4
  }

@rotateright
Copy link
Contributor

Note that this should not be an issue when compiling for more recent x86:

lzcntq	%rdi, %rax
tzcntq	%rdi, %rcx
addl	%eax, %ecx
cmpl	$10, %ecx
seta	%al
retq

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
@RKSimon
Copy link
Collaborator

RKSimon commented Sep 3, 2024

After #102885 the branches are now avoided on any x86 target with cmov:

playground::can_represent_as_f64::h8c9d47bab619cb5f: # @playground::can_represent_as_f64::h8c9d47bab619cb5f
  bsrq %rdi, %rax
  movl $127, %ecx
  cmovneq %rax, %rcx
  xorl $63, %ecx
  bsfq %rdi, %rax
  movl $64, %edx
  cmovneq %rax, %rdx
  addl %ecx, %edx
  cmpl $11, %edx
  setae %al
  retq

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 21, 2025

After #123623 x86_64 codegen will simplify to:

        movl    $127, %eax
        bsrq    %rdi, %rax
        xorl    $63, %eax
        movl    $64, %ecx
        rep bsfq    %rdi, %rcx
        addl    %eax, %ecx
        cmpl    $11, %ecx
        setae   %al
        retq

@RKSimon
Copy link
Collaborator

RKSimon commented Jan 24, 2025

Resolving this - on any target with CMOV this is now branchless, and on x86_64 (or LZCNT/TZCNT) the codegen is pretty much optimal.

@RKSimon RKSimon closed this as completed Jan 24, 2025
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
backend:X86 bugzilla Issues migrated from bugzilla
Projects
None yet
Development

No branches or pull requests

3 participants