Skip to content

x.trailing_zeros() > n is not optimized as well as x & ((1 << n) - 1) == 0 on x86 #43024

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
oli-obk opened this issue Jul 3, 2017 · 6 comments
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@oli-obk
Copy link
Contributor

oli-obk commented Jul 3, 2017

the bitshift and bitand version optimizes to

	testb	$15, %dil
	sete	%al
	retq

while the trailing_zeros version optimizes to

	movl	$32, %eax
	testl	%edi, %edi
	je	.LBB6_2
	bsfl	%edi, %eax
.LBB6_2:
	cmpl	$4, %eax
	seta	%al
	retq

even though I find the trailing_zeros version much more straight forward

Should this be reported upstream in llvm or is this something a mir pass should do?

@Mark-Simulacrum
Copy link
Member

It looks like trailing_zeros currently calls an LLVM intrinsic (cttz) which has a patch posted against it -- https://reviews.llvm.org/D9284 -- perhaps someone familiar with LLVM (cc @arielb1) could push that through review and then change libcore to not have the conditional that it does today: https://github.com/rust-lang/rust/blob/master/src/libcore/num/mod.rs#L1375-L1390.

@Mark-Simulacrum Mark-Simulacrum added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Jul 3, 2017
@Mark-Simulacrum Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 26, 2017
@nikic
Copy link
Contributor

nikic commented Dec 15, 2018

@Mark-Simulacrum A variant of that patch was implemented via llvm-mirror/llvm@1886c8e in 2016, so we should definitely have it in all LLVM versions we support and can drop that workaround.

However, I don't think this is really related to the issue seen here -- LLVM just doesn't recognize this particular pattern (probably because it would be rather odd in C). Godbolt for reference: https://godbolt.org/z/ovqsCg

I'm a bit stumped about the u8 cttz codegen though. If I disable all optimizations, I get:

define internal i32 @"_ZN4core3num20_$LT$impl$u20$u8$GT$14trailing_zeros17hf3376a11a840d3c1E"(i8 %self) unnamed_addr #0 !dbg !5 {
  %tmp_ret = alloca i8, align 1
  %0 = call i8 @llvm.cttz.i8(i8 %self, i1 false), !dbg !11
  store i8 %0, i8* %tmp_ret, align 1, !dbg !11
  %1 = load i8, i8* %tmp_ret, align 1, !dbg !11
  br label %bb1, !dbg !11

bb1:                                              ; preds = %start
  %2 = zext i8 %1 to i32, !dbg !13
  ret i32 %2, !dbg !14
}

which directly calls @llvm.cttz.i8, without the | 0x100 workaround. Possibly the uint_cttz_call! macro is not working properly?

@nikic
Copy link
Contributor

nikic commented Dec 15, 2018

Yeah, looks like the macro just isn't doing what it's intended to do: https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=33539371298f2c808a22ba3cf40db567

Centril added a commit to Centril/rust that referenced this issue Dec 16, 2018
Remove u8 cttz hack

This issue has since been fixed in LLVM: llvm-mirror/llvm@1886c8e

Furthermore this code doesn't actually work, because the 8 literal does not match the $BITS provided from the macro invocation, so effectively this was just dead code. Ref rust-lang#43024.

What LLVM does is still not ideal for CPUs that only have bsf but not tzcnt, will create a patch for that later.

r? @nagisa
@nikic
Copy link
Contributor

nikic commented Dec 16, 2018

Partially implemented in https://reviews.llvm.org/D55745. This will handle only == n and != n.

@nikic nikic self-assigned this Dec 16, 2018
@nikic
Copy link
Contributor

nikic commented Jan 27, 2019

This was fully fixed by https://reviews.llvm.org/D56355, which unfortunately did not make it into the last LLVM update.

@nikic
Copy link
Contributor

nikic commented Oct 31, 2019

There was another LLVM update since then, and this issue is now fully fixed.

@nikic nikic closed this as completed Oct 31, 2019
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

3 participants