Skip to content

Vec<u8> clone in rustc 1.33.0 is 3 times slower than rustc 1.29.0 #57437

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
breezewish opened this issue Jan 8, 2019 · 10 comments
Closed

Vec<u8> clone in rustc 1.33.0 is 3 times slower than rustc 1.29.0 #57437

breezewish opened this issue Jan 8, 2019 · 10 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@breezewish
Copy link

breezewish commented Jan 8, 2019

Benchmark code:

#[bench]
fn bench(b: &mut test::Bencher) {
    let raw = vec![0u8; 1000];
    b.iter(|| {
        test::black_box(test::black_box(&raw).clone());
    });
}

In rustc 1.29.0-nightly (4f3c7a4 2018-07-17): 32 ns/iter (+/- 34)
In rustc 1.33.0-nightly (9eac386 2018-12-31): 127 ns/iter (+/- 45)

@killercup
Copy link
Member

Can you try this? #47745 (comment)

@killercup killercup added the I-slow Issue: Problems and improvements with respect to performance of generated code. label Jan 8, 2019
@breezewish breezewish changed the title &[u8] clone in rustc 1.33.0 is 50% slower than rustc 1.29.0 Vec<u8> clone in rustc 1.33.0 is 3 times slower than rustc 1.29.0 Jan 8, 2019
@breezewish
Copy link
Author

breezewish commented Jan 8, 2019

@killercup

Hi, I tried with the following profile:

[profile.bench]
lto = false
opt-level = 3
debug = true
codegen-units = 1

[profile.release]
lto = false
opt-level = 3
debug = true
codegen-units = 1

and

[profile.bench]
lto = true
opt-level = 3
debug = true
codegen-units = 1

[profile.release]
lto = true
opt-level = 3
debug = true
codegen-units = 1

the outcome is similar.

@ollie27
Copy link
Member

ollie27 commented Jan 8, 2019

I'd guess this is due to #55238. You could try using jemallocator to confirm.

@brson
Copy link
Contributor

brson commented Jan 8, 2019

@ollie27 Oh very good guess. That would explain a lot. I do think that @breeswish's benchmarks are running against system malloc in the 'after' run. We are running at least one set of benchmarks with jemalloc both before/after: https://gist.github.com/brson/13586d9f12f3af5c8377628c3d0f12d0#file-benchcmp-tikv and have seen regressions there too, but not investigated.

We'll fix our side to make sure we are comparing jemalloc to jemalloc then see how our benchmarks look.

@brson
Copy link
Contributor

brson commented Jan 9, 2019

What I reported yesterday about not comparing allocator to allocator looks to be incorrect. @breeswish's benchmarks may have been using the same jemalloc. Still investigating.

@mati865
Copy link
Contributor

mati865 commented Jan 10, 2019

What is your system?

Since switch system allocator I'm seeing small performance increase on 3 systems with glibc 2.28 (Arch Linux, Fedora and Ubuntu).

With your benchmark I was getting results so close they weren't reliable.
These are results with let raw = vec![0u8; 1000000];:

$ cargo +nightly-2018-07-17 bench
[...]
test bench ... bench:      19,454 ns/iter (+/- 241)

$ cargo +nightly-2018-07-17 bench
[...]
test bench ... bench:      19,422 ns/iter (+/- 207)

$ cargo +nightly-2018-12-31 bench
[...]
test bench ... bench:      19,378 ns/iter (+/- 2,560)

$ cargo +nightly-2018-12-31 bench
[...]
test bench ... bench:      19,374 ns/iter (+/- 422)

$ cargo +nightly bench           
[...]
test bench ... bench:      19,352 ns/iter (+/- 7,552)

$ cargo +nightly bench
[...]
test bench ... bench:      19,342 ns/iter (+/- 7,586)

@breezewish
Copy link
Author

breezewish commented Jan 10, 2019

Hi @mati865 My OS is MacOS 10.12.6. I will try again with jemalloc linked. During that, you may first view a result powered by Travis CI (although it may not be very stable, but still referable): https://travis-ci.com/breeswish/vec_clone_play

@mati865
Copy link
Contributor

mati865 commented Jan 10, 2019

@breeswish I don't use macOS so I cannot speak for it but for such old Linux distributions jemallocator should fix the performance.

@brson
Copy link
Contributor

brson commented Jan 23, 2019

After further investigation, there indeed wasn't a problem with Vec<u8>, so this can be closed.

@breezewish
Copy link
Author

I forced to use jemalloc and discovered that there is no notable difference in the case reported by this issue. So closing.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

5 participants