Skip to content

PartialOrd derived for C-like enum isn't properly optimized for the <= operator #73338

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
pubfnbar opened this issue Jun 14, 2020 · 6 comments · Fixed by #83819
Closed

PartialOrd derived for C-like enum isn't properly optimized for the <= operator #73338

pubfnbar opened this issue Jun 14, 2020 · 6 comments · Fixed by #83819
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@pubfnbar
Copy link
Contributor

The manual version:

use std::{
    cmp::Ordering,
    time::SystemTime,
};

#[repr(u32)]
#[derive(Copy, Clone, Eq, PartialEq)]
enum Foo {
    Zero,
    One,
    Two,
}

impl PartialOrd for Foo {
    fn partial_cmp(&self, _rhs: &Foo) -> Option<Ordering> {
        panic!()
    }

    fn le(&self, rhs: &Foo) -> bool {
        *self as u32 <= *rhs as u32
    }
}

fn main() {
    let opaque = SystemTime::now()
        .duration_since(SystemTime::UNIX_EPOCH)
        .unwrap()
        .as_secs() as u32
        % 3;

    let foo = unsafe { *(&opaque as *const _ as *const Foo) };

    if foo <= Foo::One {
        println!("here");
    }
}

The manual version ASM:

playground::main:
	subq	$88, %rsp
	callq	*std::time::SystemTime::now@GOTPCREL(%rip)
	movq	%rax, 56(%rsp)
	movq	%rdx, 64(%rsp)
	leaq	8(%rsp), %rdi
	leaq	56(%rsp), %rsi
	xorl	%edx, %edx
	xorl	%ecx, %ecx
	callq	*std::time::SystemTime::duration_since@GOTPCREL(%rip)
	cmpq	$1, 8(%rsp)
	je	.LBB4_4
	movl	16(%rsp), %eax
	movl	$2863311531, %ecx
	imulq	%rax, %rcx
	shrq	$33, %rcx
	leal	(%rcx,%rcx,2), %ecx
	subl	%ecx, %eax
	cmpl	$1, %eax
	ja	.LBB4_3
	leaq	.L__unnamed_2(%rip), %rax
	movq	%rax, 8(%rsp)
	movq	$1, 16(%rsp)
	movq	$0, 24(%rsp)
	leaq	.L__unnamed_3(%rip), %rax
	movq	%rax, 40(%rsp)
	movq	$0, 48(%rsp)
	leaq	8(%rsp), %rdi
	callq	*std::io::stdio::_print@GOTPCREL(%rip)

The derived version:

use std::{
    cmp::Ordering,
    time::SystemTime,
};

#[repr(u32)]
#[derive(Copy, Clone, Eq, PartialEq, PartialOrd)]
enum Foo {
    Zero,
    One,
    Two,
}

fn main() {
    let opaque = SystemTime::now()
        .duration_since(SystemTime::UNIX_EPOCH)
        .unwrap()
        .as_secs() as u32
        % 3;

    let foo = unsafe { *(&opaque as *const _ as *const Foo) };

    if foo <= Foo::One {
        println!("here");
    }
}

The derived version ASM:

playground::main:
	subq	$88, %rsp
	callq	*std::time::SystemTime::now@GOTPCREL(%rip)
	movq	%rax, 56(%rsp)
	movq	%rdx, 64(%rsp)
	leaq	8(%rsp), %rdi
	leaq	56(%rsp), %rsi
	xorl	%edx, %edx
	xorl	%ecx, %ecx
	callq	*std::time::SystemTime::duration_since@GOTPCREL(%rip)
	cmpq	$1, 8(%rsp)
	je	.LBB4_4
	movl	16(%rsp), %eax
	movl	$2863311531, %ecx
	imulq	%rax, %rcx
	shrq	$33, %rcx
	leal	(%rcx,%rcx,2), %ecx
	xorl	%edx, %edx
	subl	%ecx, %eax
	setne	%dl
	xorl	%ecx, %ecx
	cmpl	$1, %eax
	leaq	-1(%rdx,%rdx), %rax
	cmovneq	%rax, %rcx
	addq	$1, %rcx
	cmpq	$1, %rcx
	ja	.LBB4_3
	leaq	.L__unnamed_2(%rip), %rax
	movq	%rax, 8(%rsp)
	movq	$1, 16(%rsp)
	movq	$0, 24(%rsp)
	leaq	.L__unnamed_3(%rip), %rax
	movq	%rax, 40(%rsp)
	movq	$0, 48(%rsp)
	leaq	8(%rsp), %rdi
	callq	*std::io::stdio::_print@GOTPCREL(%rip)

The diff is that while the manual version does:

	subl	%ecx, %eax
	cmpl	$1, %eax

...the derived version does:

	xorl	%edx, %edx
	subl	%ecx, %eax
	setne	%dl
	xorl	%ecx, %ecx
	cmpl	$1, %eax
	leaq	-1(%rdx,%rdx), %rax
	cmovneq	%rax, %rcx
	addq	$1, %rcx
	cmpq	$1, %rcx

All the other PartialOrd operators are optimized properly.

@jonas-schievink jonas-schievink added C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 14, 2020
@dotdash
Copy link
Contributor

dotdash commented Jun 24, 2020

LLVM can't properly optimize the match against Some(Less | Equal), if we instead match against None | Some(Greater) and negate the result, LLVM does find a way to optimize the code, because the IR we generate is somewhat simpler having to handle only one of the Some() cases.

I didn't really look into how to maybe teach LLVM to handle this case, and didn't see anything obvious in the way we generate IR that could be improved here either.

Should I prepare a PR to adjust the default impl of le with a comment that explains why it's written that way, or should we investigate more here?

@pubfnbar
Copy link
Contributor Author

I wonder if this issue is related to #83623

@AngelicosPhosphoros
Copy link
Contributor

I wonder if this issue is related to #83623

We could test it in next nightly.

@AngelicosPhosphoros
Copy link
Contributor

I wonder if this issue is related to #83623

We could test it in next nightly.

Well, this issue doesn't fixed by it (which is expected because we changed MIR of && and ||, not match) so we need to make changes in LLVM or change our generated code.

@AngelicosPhosphoros
Copy link
Contributor

Created PR with fix.

@nikic
Copy link
Contributor

nikic commented Apr 4, 2021

Upstream fix for LLVM: llvm/llvm-project@9bad7de

bors added a commit to rust-lang-ci/rust that referenced this issue Apr 5, 2021
…artial-eq-impl, r=Mark-Simulacrum

Optimize jumps in PartialOrd le

Closes rust-lang#73338
This change stops default implementation of `le()` method of PartialOrd from generating jumps.
@bors bors closed this as completed in ed0d8fa Apr 5, 2021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants