Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

str::is_char_boundary - slight optimization #84751

Merged
merged 2 commits into from
May 15, 2021

Conversation

Soveu
Copy link
Contributor

@Soveu Soveu commented Apr 30, 2021

Current str::is_char_boundary implementation emits slightly more instructions, because it includes an additional branch for index == s.len()

pub fn is_char_boundary(s: &str, index: usize) -> bool {
    if index == 0 || index == s.len() {
        return true;
    }
    match s.as_bytes().get(index) {
        None => false,
        Some(&b) => (b as i8) >= -0x40,
    }
}

Just changing the place of index == s.len() merges it with index < s.len() from s.as_bytes().get(index)

pub fn is_char_boundary2(s: &str, index: usize) -> bool {
    if index == 0 {
        return true;
    }

    match s.as_bytes().get(index) {
        // For some reason, LLVM likes this comparison here more
        None => index == s.len(),
        // This is bit magic equivalent to: b < 128 || b >= 192
        Some(&b) => (b as i8) >= -0x40,
    }
}

This one has better codegen on every platform, except powerpc

x86 codegen

example::is_char_boundary:
        mov     al, 1
        test    rdx, rdx
        je      .LBB0_5
        cmp     rsi, rdx
        je      .LBB0_5
        cmp     rsi, rdx
        jbe     .LBB0_3
        cmp     byte ptr [rdi + rdx], -65
        setg    al
.LBB0_5:
        ret
.LBB0_3:
        xor     eax, eax
        ret

example::is_char_boundary2:
        test    rdx, rdx
        je      .LBB1_1
        cmp     rsi, rdx
        jbe     .LBB1_4
        cmp     byte ptr [rdi + rdx], -65
        setg    al
        ret
.LBB1_1:  ; technically this branch is the same as LBB1_4
        mov     al, 1
        ret
.LBB1_4:
        sete    al
        ret

aarch64 codegen

example::is_char_boundary:
        mov     x8, x0
        mov     w0, #1
        cbz     x2, .LBB0_4
        cmp     x1, x2
        b.eq    .LBB0_4
        b.ls    .LBB0_5
        ldrsb   w8, [x8, x2]
        cmn     w8, #65
        cset    w0, gt
.LBB0_4:
        ret
.LBB0_5:
        mov     w0, wzr
        ret

example::is_char_boundary2:
        cbz     x2, .LBB1_3
        cmp     x1, x2
        b.ls    .LBB1_4
        ldrsb   w8, [x0, x2]
        cmn     w8, #65
        cset    w0, gt
        ret
.LBB1_3:
        mov     w0, #1
        ret
.LBB1_4:
        cset    w0, eq
        ret

riscv64gc codegen

example::is_char_boundary:
seqz a3, a2
xor a4, a1, a2
seqz a4, a4
or a4, a4, a3
addi a3, zero, 1
bnez a4, .LBB0_3
bgeu a2, a1, .LBB0_4
add a0, a0, a2
lb a0, 0(a0)
addi a1, zero, -65
slt a3, a1, a0
.LBB0_3:
mv a0, a3
ret
.LBB0_4:
mv a0, zero
ret

example::is_char_boundary2:
beqz a2, .LBB1_3
bgeu a2, a1, .LBB1_4
add a0, a0, a2
lb a0, 0(a0)
addi a1, zero, -65
slt a0, a1, a0
ret
.LBB1_3:
addi a0, zero, 1
ret
.LBB1_4:
xor a0, a1, a2
seqz a0, a0
ret

Link to godbolt

@rustbot label: A-codegen

@rustbot rustbot added the A-codegen Area: Code generation label Apr 30, 2021
@rust-highfive
Copy link
Contributor

r? @yaahc

(rust-highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Apr 30, 2021
@jyn514 jyn514 added I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed A-codegen Area: Code generation labels Apr 30, 2021
@jyn514
Copy link
Member

jyn514 commented Apr 30, 2021

@Soveu fyi - A-codegen is usually used for the codegen part of the compiler itself, not for changes to the standard library that affect performance.

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

btw, can I add I-slow label myself or only members can do it?

@jyn514
Copy link
Member

jyn514 commented Apr 30, 2021

You should be able to - try pinging rustbot with label: +I-slow?

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

I was thinking all the time that only members can add those fancy red I-something labels 👀
@rustbot label: +I-slow

@klensy
Copy link
Contributor

klensy commented Apr 30, 2021

Slightly less cryptic version https://godbolt.org/z/Y9o86ch6h

pub fn is_char_boundary(s: &str, index: usize) -> bool {
    match s.as_bytes().get(index) {
        None => false,
        Some(_) if index == 0 || index == s.len() => true,
        Some(&b) => (b as i8) >= -0x40,
    }
}

but this swaps order of cmp and test, better benchmark all versions to check actual difference.

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

Slightly less cryptic version https://godbolt.org/z/Y9o86ch6h

pub fn is_char_boundary(s: &str, index: usize) -> bool {
    match s.as_bytes().get(index) {
        None => false,
        Some(_) if index == 0 || index == s.len() => true,
        Some(&b) => (b as i8) >= -0x40,
    }
}

but this swaps order of cmp and test, better benchmark all versions to check actual difference.

index == s.len() will never fire if get(index) returned Some

@klensy
Copy link
Contributor

klensy commented Apr 30, 2021

index == s.len() will never fire if get(index) returned Some

Upps

pub fn is_char_boundary(s: &str, index: usize) -> bool {
    match s.as_bytes().get(index) {
        None if index == s.len() => true,
        None => false,
        Some(_) if index == 0 => true,
        Some(&b) => (b as i8) >= -0x40,
    }
}

But this looks bad.

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

I will soon add some comments

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

Plus, I want to test how it affects s.get(..i) and s.get(i..) which rely heavily on returning early if index == 0 or index == s.len() (and LLVM actually optimizing it)

@Soveu
Copy link
Contributor Author

Soveu commented Apr 30, 2021

Both on opt-level=3 and opt-level=2 there is only one bound check in get(..i) and get(i..).
On opt-level=1 it's more complicated:

  • functions aren't inlined by default, so no cross-function optimizations can occur
  • when forcing it via #[inline(always)], the new implementation loses at get(i..), because the index == s.len() is "hidden" after s.as_bytes().get(i)

@klensy
Copy link
Contributor

klensy commented May 1, 2021

pub fn is_char_boundary(s: &str, index: usize) -> bool {
    if index == 0 {
        true
    } else if index < s.len() {
        unsafe { *s.as_bytes().get_unchecked(index) as i8 >= -0x40 }
    } else if index == s.len() {
        true
    } else {
        false
    }
}

@Soveu
Copy link
Contributor Author

Soveu commented May 12, 2021

@yaahc ?

@Amanieu
Copy link
Member

Amanieu commented May 14, 2021

@bors try @rust-timer queue

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 14, 2021
@bors
Copy link
Collaborator

bors commented May 14, 2021

⌛ Trying commit 7bd9d9f with merge b8c3fd3abbd324a8ce3163e2d72ad2da5bb38d83...

@bors
Copy link
Collaborator

bors commented May 14, 2021

☀️ Try build successful - checks-actions
Build commit: b8c3fd3abbd324a8ce3163e2d72ad2da5bb38d83 (b8c3fd3abbd324a8ce3163e2d72ad2da5bb38d83)

@rust-timer
Copy link
Collaborator

Queued b8c3fd3abbd324a8ce3163e2d72ad2da5bb38d83 with parent 69b352e, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (b8c3fd3abbd324a8ce3163e2d72ad2da5bb38d83): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 14, 2021
@Amanieu
Copy link
Member

Amanieu commented May 14, 2021

Perf changes seem to be in the noise.

@bors r+ rollup=maybe

@bors
Copy link
Collaborator

bors commented May 14, 2021

📌 Commit 7bd9d9f has been approved by Amanieu

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 14, 2021
@Soveu
Copy link
Contributor Author

Soveu commented May 14, 2021

Perf changes seem to be in the noise.

I was kinda expecting it, it's not a huge thing. What about binary size?

bors added a commit to rust-lang-ci/rust that referenced this pull request May 15, 2021
…laumeGomez

Rollup of 4 pull requests

Successful merges:

 - rust-lang#84751 (str::is_char_boundary - slight optimization)
 - rust-lang#85185 (Generate not more docs than necessary)
 - rust-lang#85324 (Warn about unused `pub` fields in non-`pub` structs)
 - rust-lang#85329 (fix version_str comment)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 62b834f into rust-lang:master May 15, 2021
@rustbot rustbot added this to the 1.54.0 milestone May 15, 2021
@matthieu-m
Copy link
Contributor

@klensy I'm surprised by those constructs:

        None if index == s.len() => true,
        None => false,
    if index == s.len() {
        true
    } else {
        false
    }

Naively, I would expect to just return index == s.len() rather branching on it to select true or false.

Is there any particular reason for the unusual branch?

@klensy
Copy link
Contributor

klensy commented May 21, 2021

@matthieu-m

All that if branches is straightforward walk over index, starting from 0 and up, checking current state, but don't require to think about None => index == self.len() trick. Plus, that compiles to exactly the same asm, if i remember that correctly.

workingjubilee added a commit to workingjubilee/rustc that referenced this pull request Oct 29, 2024
Mark `str::is_char_boundary` and `str::split_at*` unstably `const`.

Tracking issues: rust-lang#131516, rust-lang#131518

First commit implements `const_is_char_boundary`, second commit implements `const_str_split_at` (which depends on `const_is_char_boundary`)

~~I used `const_eval_select` for `is_char_boundary` since there is a comment about optimizations that would theoretically not happen with the simple `const`-compatible version (since `slice::get` is not `const`ifiable) cc rust-lang#84751. I have not checked if this code difference is still required for the optimization, so it might not be worth the code complication, but 🤷.~~

This changes `str::split_at_checked` to use a new private helper function `split_at_unchecked` (copied from `split_at_mut_unchecked`) that does pointer stuff instead of `get_unchecked`, since that is not currently `const`ifiable due to using the `SliceIndex` trait.
rust-timer added a commit to rust-lang-ci/rust that referenced this pull request Oct 29, 2024
Rollup merge of rust-lang#131520 - zachs18:const-str-split, r=Noratrieb

Mark `str::is_char_boundary` and `str::split_at*` unstably `const`.

Tracking issues: rust-lang#131516, rust-lang#131518

First commit implements `const_is_char_boundary`, second commit implements `const_str_split_at` (which depends on `const_is_char_boundary`)

~~I used `const_eval_select` for `is_char_boundary` since there is a comment about optimizations that would theoretically not happen with the simple `const`-compatible version (since `slice::get` is not `const`ifiable) cc rust-lang#84751. I have not checked if this code difference is still required for the optimization, so it might not be worth the code complication, but 🤷.~~

This changes `str::split_at_checked` to use a new private helper function `split_at_unchecked` (copied from `split_at_mut_unchecked`) that does pointer stuff instead of `get_unchecked`, since that is not currently `const`ifiable due to using the `SliceIndex` trait.
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants