Skip to content

Missed optimization: Loop with decreasing index does not elide bounds check #74186

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
HeroicKatora opened this issue Jul 9, 2020 · 3 comments
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@HeroicKatora
Copy link
Contributor

I tried code very similar to the code below. The expected produced code would perform one bounds check at the start of the function and none inside the loop. offsets is first asserted to be valid for the initial index. This should be enough to also find that it is valid for all smaller indices. No loop iteration can increase the index so the index correctness is a loop invariant.

fn problematic(buf: &mut [u8], offsets: &[u8], mut idx: usize) {
    let offsets = &offsets[..=idx];
    for b in buf {
        *b = idx as u8;
        idx = idx.saturating_sub(usize::from(offsets[idx]));
    }
}

Instead we get this:

playground::problematic:
	push	rax
	cmp	r8, -1
	je	.LBB0_3
	mov	r9, rsi
	lea	rsi, [r8 + 1]
	cmp	r8, rcx
	jae	.LBB0_2
	test	r9, r9
	je	.LBB0_8
	xor	r10d, r10d
	mov	rcx, r8

.LBB0_6:
	mov	byte ptr [rdi + r10], cl
;; The critical jump and cmp we don't want to have, costs ~5-10% loop perf
	cmp	rcx, r8
	ja	.LBB0_9
	movzx	eax, byte ptr [rdx + rcx]
	sub	rcx, rax
	mov	eax, 0
	cmovb	rcx, rax
	add	r10, 1
	cmp	r9, r10
	jne	.LBB0_6

.LBB0_8:
	pop	rax
	ret

.LBB0_9:
	lea	rdx, [rip + .Lanon.9b054490e6ade753ffe504f874525a87.2]
	mov	rdi, rcx
	call	qword ptr [rip + core::panicking::panic_bounds_check@GOTPCREL]
	ud2

.LBB0_3:
	lea	rdi, [rip + .Lanon.9b054490e6ade753ffe504f874525a87.1]
	call	qword ptr [rip + core::slice::slice_index_overflow_fail@GOTPCREL]
	ud2

.LBB0_2:
	lea	rdx, [rip + .Lanon.9b054490e6ade753ffe504f874525a87.1]
	mov	rdi, rsi
	mov	rsi, rcx
	call	qword ptr [rip + core::slice::slice_index_len_fail@GOTPCREL]
	ud2

.Lanon.9b054490e6ade753ffe504f874525a87.0:
	.ascii	"src/lib.rs"

.Lanon.9b054490e6ade753ffe504f874525a87.1:
	.quad	.Lanon.9b054490e6ade753ffe504f874525a87.0
	.asciz	"\n\000\000\000\000\000\000\000\002\000\000\000\024\000\000"

.Lanon.9b054490e6ade753ffe504f874525a87.2:
	.quad	.Lanon.9b054490e6ade753ffe504f874525a87.0
	.asciz	"\n\000\000\000\000\000\000\000\005\000\000\000.\000\000"
@jonas-schievink jonas-schievink added I-slow Issue: Problems and improvements with respect to performance of generated code. C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. labels Jul 9, 2020
@Shnatsel
Copy link
Member

Various permutations, including explicitly clamping the value using min() and max(), also do not result in an elided bounds check.

@the8472
Copy link
Member

the8472 commented Jul 14, 2020

Various permutations, including explicitly clamping the value using min() and max(), also do not result in an elided bounds check.

This does:

pub fn problematic(buf: &mut [u8], offsets: &[u8], mut idx: usize) {
    let offsets = &offsets[..=idx];
    for b in buf {
        *b = idx as u8;
        let tmp = std::cmp::min(offsets.len() - 1, idx);
        idx = idx.saturating_sub(usize::from(offsets[tmp]));
    }
}

@Shnatsel
Copy link
Member

Shnatsel commented Jul 14, 2020

Ah, an even simpler version with min() also does elide bounds checks:

pub fn problematic(buf: &mut [u8], offsets: &[u8], mut idx: usize) {
    let offsets = &offsets[..=idx];
    for b in buf {
        *b = idx as u8;
        idx = std::cmp::min(offsets.len() - 1, idx - usize::from(offsets[idx]));
    }
}

Looks like the issue is specific to saturating_sub() then

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants