Skip to content

Perf regression in Vec<u8> Write impl? #24095

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
frankmcsherry opened this issue Apr 5, 2015 · 3 comments · Fixed by #24120
Closed

Perf regression in Vec<u8> Write impl? #24095

frankmcsherry opened this issue Apr 5, 2015 · 3 comments · Fixed by #24120

Comments

@frankmcsherry
Copy link
Contributor

Hi folks. A bunch of my serialization code recently got a lot slower, and I think I've tracked it down to writing binary data into a Vec<u8>. This used to be quite fast, well above the 8-10GB/s range. It is now about 1GB/s for me. The following main.rs demos the performance I'm seeing.

#![feature(test)]
extern crate test;

use test::Bencher;
use std::io::Write;

#[bench] 
fn bench(bencher: &mut Bencher) {
    let data = &[0u8; 4096];
    let mut buffer = Vec::with_capacity(data.len());
    bencher.bytes = data.len() as u64;
    bencher.iter(|| {
        buffer.clear();
        buffer.write_all(data).unwrap();
    });
}

fn main() {
    let data = &[0u8; 4096];
    let mut buffer = Vec::with_capacity(data.len());

    // writes 4GB, takes .. 4s+
    for _ in (0..(1 << 20)) {
        buffer.clear();
        buffer.write_all(data).unwrap();
    };
}

The perf numbers look like (where main writes 4GB in 4KB chunks):

Echidnatron% cargo bench; time cargo run --release 
     Running target/release/bench-5e9b1b37cda85a22

running 1 test
test bench ... bench:      4086 ns/iter (+/- 596) = 1002 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 1 measured

     Running `target/release/bench`
cargo run --release  4.06s user 0.03s system 99% cpu 4.094 total
Echidnatron% 

Sorry if this is old news, or a misdiagnosis. Something is a bit slower now, though. The comments in the source (Vec<T>::push_all) do suggest it isn't stable yet because the impl might get faster; I didn't expect it to get 10x slower on me though :).

Echidnatron% cargo --version
cargo 0.0.1-pre-nightly (d71f748 2015-04-03) (built 2015-04-04)
@alexcrichton
Copy link
Member

I think this is the same cause as #24014, at least the performance of Iterator for ops::Range comment.

Nominating for 1.0 as this is quite a critical iterator to have perform well.

triage: I-nominated

@aturon
Copy link
Member

aturon commented Apr 6, 2015

@alexcrichton I will try to fix this today.

aturon added a commit to aturon/rust that referenced this issue Apr 6, 2015
A recent change to the implementation of range iterators meant that,
even when stepping by 1, the iterators *always* involved checked
arithmetic.

This commit reverts to the earlier behavior (while retaining the
refactoring into traits).

Fixes rust-lang#24095
cc rust-lang#24014
aturon added a commit to aturon/rust that referenced this issue Apr 7, 2015
A recent change to the implementation of range iterators meant that,
even when stepping by 1, the iterators *always* involved checked
arithmetic.

This commit reverts to the earlier behavior (while retaining the
refactoring into traits).

Fixes rust-lang#24095
cc rust-lang#24014
bors added a commit that referenced this issue Apr 8, 2015
A recent change to the implementation of range iterators meant that,
even when stepping by 1, the iterators *always* involved checked
arithmetic.

This commit reverts to the earlier behavior (while retaining the
refactoring into traits).

Fixes #24095
Closes #24119
cc #24014 

r? @alexcrichton
@frankmcsherry
Copy link
Contributor Author

Can confirm with the new nightly that the #[bench] is reporting 50GB/s. <3

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants