Skip to content

included range in for loop gives more assembly code than excluded range #102462

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
melonges opened this issue Sep 29, 2022 · 9 comments
Open

included range in for loop gives more assembly code than excluded range #102462

melonges opened this issue Sep 29, 2022 · 9 comments
Assignees
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@melonges
Copy link

melonges commented Sep 29, 2022

Given the following code where source will be excluded

pub fn store(source: &i32, dest: &mut i32) -> i32 {
    let mut accum = 0;
    for _ in 0..*source {  // source will be excluded
        accum += *dest;
    }
    accum
}

The assembly output is:

example::store:
        mov     ecx, dword ptr [rdi]
        mov     edx, dword ptr [rsi]
        imul    edx, ecx
        xor     eax, eax
        test    ecx, ecx
        cmovg   eax, edx
        ret

if source will be included the assembly code will increase more than

pub fn store(source: &i32, dest: &mut i32) -> i32 {
    let mut accum = 0;
    for _ in 0..=*source { //  source will be included
        accum += *dest;
    }
    accum
}

The assembly output is:

example::store:
        mov     ecx, dword ptr [rdi]
        test    ecx, ecx
        js      .LBB0_1
        mov     r8d, dword ptr [rsi]
        xor     eax, eax
        xor     esi, esi
.LBB0_3:
        xor     edi, edi
        cmp     esi, ecx
        setl    dl
        add     eax, r8d
        cmp     esi, ecx
        jge     .LBB0_5
        mov     dil, dl
        add     esi, edi
        cmp     esi, ecx
        jle     .LBB0_3
.LBB0_5:
        ret
.LBB0_1:
        xor     eax, eax
        ret

I tested on rustc 1.64.0 with -C opt-level=3

@melonges melonges added A-diagnostics Area: Messages for errors, warnings, and lints T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Sep 29, 2022
@Rageking8
Copy link
Contributor

@rustbot label -A-diagnostics +A-codegen +I-heavy

@rustbot rustbot added A-codegen Area: Code generation I-heavy Issue: Problems and improvements with respect to binary size of generated code. and removed A-diagnostics Area: Messages for errors, warnings, and lints labels Sep 29, 2022
@lukas-code
Copy link
Member

This regressed between 1.42 and 1.43: Godbolt link

related: #45222

@leonardo-m
Copy link

(If you have two nested loop, the code is bad even in older compilers).

@jdahlstrom
Copy link

jdahlstrom commented Oct 1, 2022

This regressed between 1.42 and 1.43: Godbolt link

The 1.42 codegen is tricky, as it seems at first that it doesn't account for the case where *source == i32::MAX and just goes add eax, 1. But it ends up giving the expected result, which is 0 (mod 2^31) no matter the value of *dest. But in many other cases inclusive ranges necessarily produce suboptimal code because the T::MAX case requires special handling.

@nikic nikic added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-slow Issue: Problems and improvements with respect to performance of generated code. labels Oct 1, 2022
@nikic
Copy link
Contributor

nikic commented Oct 1, 2022

Simpler example without the unnecessary memory indirections: https://rust.godbolt.org/z/GPvrbvbYY

External iteration over inclusive ranges is well known to optimize badly. Optimization may be viable in this particular case though.

@nikic
Copy link
Contributor

nikic commented Dec 21, 2022

This is the fold we'd need: https://alive2.llvm.org/ce/z/JsWRvT

@ilyvion
Copy link

ilyvion commented Dec 22, 2023

Came across this weirdness today; it's still a problem in December 2023:

pub fn inclusive() -> usize {
    (2..=100).step_by(2).sum::<usize>()
}

pub fn exclusive() -> usize {
    (2..101).step_by(2).sum::<usize>()
}

produces assembly with 46 lines and 3 lines, respectively. Was going to create a new bug for it, but I'm pretty sure it's just another version of this one.

@veera-sivarajan
Copy link
Contributor

@rustbot claim

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Feb 14, 2025
@RodBurman
Copy link

The code snippet:

pub fn inclusive() -> usize {
     (2..=100).step_by(2).sum::<usize>()
}
 
 pub fn exclusive() -> usize {
     (2..101).step_by(2).sum::<usize>()
 }

compiled with:

% rustc -v -V 
rustc 1.85.0 (4d91de4e4 2025-02-17)
binary: rustc
commit-hash: 4d91de4e48198da2e33413efdcd9cd2cc0c46688
commit-date: 2025-02-17
host: aarch64-apple-darwin
release: 1.85.0
LLVM version: 19.1.7

Produces functions that both take up 25 lines of assembler (not necessarily 25 instructions). So no diference between inclusive and exclusive ranges. I am unsure if this is the update to rust 1.85.0, the move to an ARM processor or to MacOS.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-heavy Issue: Problems and improvements with respect to binary size of generated code. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests