Change single char str patterns to chars #52646

ljedrz · 2018-07-23T13:38:44Z

A char is faster.

rust-highfive · 2018-07-23T13:38:54Z

(rust_highfive has picked a reviewer for you, use r? to override)

petrochenkov · 2018-07-23T13:50:33Z

A char is faster

I wouldn't be so sure, see #41993
str is "search bytes in bytes", while char is "convert char to bytes, then search bytes in bytes".

ljedrz · 2018-07-23T13:57:24Z

Oh, ok; in that case I guess this might call for an update in clippy, as its single_char_pattern lint suggests that it's the other way round.

cc @oli-obk

ljedrz · 2018-07-23T14:04:11Z

@petrochenkov I just did a comparison using godbolt and the char variant produces half the str assembly; perhaps that issue is limited to starts_with? I'm not an expert, but this is confusing.

killercup · 2018-07-23T15:43:05Z

A char is faster.

I don't see any benchmarks? :)

ljedrz · 2018-07-23T15:45:17Z

@killercup see the related clippy lint.

kennytm · 2018-07-23T15:45:20Z

@petrochenkov #41993 only concerns about .starts_with(). The split() algorithm for char uses memchr() for a char and is way more efficient than that for a &str (which degenerates to naive search).

killercup · 2018-07-23T15:50:12Z

@ljedrz a piece of documentation is still no benchmark :)

Seriously, if we don't have one, we should totally write one and add it. This is easily decidable once we have the facts.

Edit: Maybe we can use the ones @kennytm wrote last year:
#41993 (comment). The most significant change to land since then was probably #46735. And if nothing else, we can link the results in the clippy docs.

Edit 2: We are currently linking to libc::memchr on most platforms. Adding a SIMD-aware Rust version might allow for better inlining. (But again, without benchmarks this is just speculation.)

kennytm · 2018-07-23T16:09:59Z

Since I just happened to have benchmarks from rust-lang/rfcs#2500, here's the relevant parts comparing .starts_with() and .split():

.starts_with()

test pat3_starts_with_ascii_char::long_lorem_ipsum      ... bench:         174 ns/iter (+/- 12)
test pat3_starts_with_ascii_char::short_ascii           ... bench:         175 ns/iter (+/- 21)
test pat3_starts_with_ascii_char::short_mixed           ... bench:         174 ns/iter (+/- 9)
test pat3_starts_with_ascii_char::short_pile_of_poo     ... bench:         175 ns/iter (+/- 26)
test pat3_starts_with_ascii_string::long_lorem_ipsum    ... bench:         136 ns/iter (+/- 20)
test pat3_starts_with_ascii_string::short_ascii         ... bench:         138 ns/iter (+/- 26)
test pat3_starts_with_ascii_string::short_mixed         ... bench:         137 ns/iter (+/- 13)
test pat3_starts_with_ascii_string::short_pile_of_poo   ... bench:         136 ns/iter (+/- 15)
test std_starts_with_ascii_char::long_lorem_ipsum       ... bench:         339 ns/iter (+/- 29)
test std_starts_with_ascii_char::short_ascii            ... bench:         338 ns/iter (+/- 21)
test std_starts_with_ascii_char::short_mixed            ... bench:         844 ns/iter (+/- 42)
test std_starts_with_ascii_char::short_pile_of_poo      ... bench:         930 ns/iter (+/- 91)
test std_starts_with_ascii_string::long_lorem_ipsum     ... bench:         306 ns/iter (+/- 41)
test std_starts_with_ascii_string::short_ascii          ... bench:         307 ns/iter (+/- 63)
test std_starts_with_ascii_string::short_mixed          ... bench:         252 ns/iter (+/- 122)
test std_starts_with_ascii_string::short_pile_of_poo    ... bench:         222 ns/iter (+/- 193)

Speed ups:

Method	Long ASCII	Short ASCII	Short Mixed	Short Unicode
`.starts_with('/')` from libstd (baseline)	0%	0%	0%	0%
`.starts_with("/")` from libstd	-10%	-9%	-70%	-76%
`.starts_with('/')` from RFC 2500	-49%	-48%	-79%	-81%
`.starts_with("/")` from RFC 2500	-60%	-59%	-84%	-85%

.split()

test pat3_split_a_char::long_lorem_ipsum                ... bench:       3,084 ns/iter (+/- 517)
test pat3_split_a_char::short_ascii                     ... bench:         158 ns/iter (+/- 15)
test pat3_split_a_char::short_mixed                     ... bench:         102 ns/iter (+/- 27)
test pat3_split_a_char::short_pile_of_poo               ... bench:          13 ns/iter (+/- 0)
test pat3_split_a_str::long_lorem_ipsum                 ... bench:       5,592 ns/iter (+/- 692)
test pat3_split_a_str::short_ascii                      ... bench:         180 ns/iter (+/- 23)
test pat3_split_a_str::short_mixed                      ... bench:         160 ns/iter (+/- 17)
test pat3_split_a_str::short_pile_of_poo                ... bench:         109 ns/iter (+/- 9)
test std_split_a_char::long_lorem_ipsum                 ... bench:       3,003 ns/iter (+/- 231)
test std_split_a_char::short_ascii                      ... bench:         159 ns/iter (+/- 11)
test std_split_a_char::short_mixed                      ... bench:         108 ns/iter (+/- 2)
test std_split_a_char::short_pile_of_poo                ... bench:          23 ns/iter (+/- 0)
test std_split_a_str::long_lorem_ipsum                  ... bench:       6,176 ns/iter (+/- 693)
test std_split_a_str::short_ascii                       ... bench:         194 ns/iter (+/- 7)
test std_split_a_str::short_mixed                       ... bench:         170 ns/iter (+/- 17)
test std_split_a_str::short_pile_of_poo                 ... bench:         109 ns/iter (+/- 16)

Speed ups:

Method	Long ASCII	Short ASCII	Short Mixed	Short Unicode
`.split('a').count()` from libstd (baseline)	0%	0%	0%	0%
`.split("a").count()` from libstd	+106%	+22%	+57%	+374%
`.split('a').count()` from RFC 2500	+3%	-1%	-6%	-43%
`.split("a").count()` from RFC 2500	+86%	+13%	+48%	+374%

While .starts_with('/') is much slower than .starts_with("/"), we see that the situation is reversed that .split('a').count() is much faster than .split("a").count(). This is because we did not optimize for single-byte string searching at all, but we do optimize for ASCII char searching.

michaelwoerister · 2018-07-24T08:20:28Z

Thanks for the PR, @ljedrz!

@kennytm's numbers suggest that this should be a win (although none of the modified code is really on the critical path).

@bors r+

bors · 2018-07-24T08:20:29Z

📌 Commit 49c8ba9 has been approved by michaelwoerister

bors · 2018-07-24T08:24:20Z

⌛ Testing commit 49c8ba9 with merge a2af866...

Change single char str patterns to chars A `char` is faster.

bors · 2018-07-24T10:46:15Z

☀️ Test successful - status-appveyor, status-travis
Approved by: michaelwoerister
Pushing a2af866 to master...

Change single char str patterns to chars

49c8ba9

rust-highfive assigned michaelwoerister Jul 23, 2018

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jul 23, 2018

kennytm mentioned this pull request Jul 23, 2018

str::starts_with('x') (literal char) is slower than str::starts_with("x") (literal string). #41993

Closed

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 24, 2018

bors added a commit that referenced this pull request Jul 24, 2018

Auto merge of #52646 - ljedrz:single_char_pattern, r=michaelwoerister

a2af866

Change single char str patterns to chars A `char` is faster.

bors merged commit 49c8ba9 into rust-lang:master Jul 24, 2018

ljedrz deleted the single_char_pattern branch July 24, 2018 11:07

pnkfelix mentioned this pull request Oct 9, 2018

error: internal compiler error: Accessing (*_310) with the kind Write(Move) shouldn't be possible #54597

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Change single char str patterns to chars #52646

Change single char str patterns to chars #52646

Uh oh!

ljedrz commented Jul 23, 2018

Uh oh!

rust-highfive commented Jul 23, 2018

Uh oh!

petrochenkov commented Jul 23, 2018

Uh oh!

ljedrz commented Jul 23, 2018

Uh oh!

ljedrz commented Jul 23, 2018

Uh oh!

killercup commented Jul 23, 2018

Uh oh!

ljedrz commented Jul 23, 2018

Uh oh!

kennytm commented Jul 23, 2018

Uh oh!

killercup commented Jul 23, 2018 •

edited

Loading

Uh oh!

kennytm commented Jul 23, 2018

Uh oh!

michaelwoerister commented Jul 24, 2018

Uh oh!

bors commented Jul 24, 2018

Uh oh!

bors commented Jul 24, 2018

Uh oh!

bors commented Jul 24, 2018

Uh oh!

Uh oh!

Change single char str patterns to chars #52646

Change single char str patterns to chars #52646

Uh oh!

Conversation

ljedrz commented Jul 23, 2018

Uh oh!

rust-highfive commented Jul 23, 2018

Uh oh!

petrochenkov commented Jul 23, 2018

Uh oh!

ljedrz commented Jul 23, 2018

Uh oh!

ljedrz commented Jul 23, 2018

Uh oh!

killercup commented Jul 23, 2018

Uh oh!

ljedrz commented Jul 23, 2018

Uh oh!

kennytm commented Jul 23, 2018

Uh oh!

killercup commented Jul 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kennytm commented Jul 23, 2018

Uh oh!

michaelwoerister commented Jul 24, 2018

Uh oh!

bors commented Jul 24, 2018

Uh oh!

bors commented Jul 24, 2018

Uh oh!

bors commented Jul 24, 2018

Uh oh!

Uh oh!

killercup commented Jul 23, 2018 •

edited

Loading