Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Serialization performance regression #38021

Closed
dtolnay opened this issue Nov 26, 2016 · 10 comments
Closed

Serialization performance regression #38021

dtolnay opened this issue Nov 26, 2016 · 10 comments
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@dtolnay
Copy link
Member

dtolnay commented Nov 26, 2016

Between rustc 1.14.0-nightly (5665bdf3e 2016-11-02) and rustc 1.14.0-nightly (cae6ab1c4 2016-11-05) all of my serialization benchmarks showed a ~50% regression. It affected all the libraries (Serde, json-rust, rustc-serialize). Deserialization was not affected. So far I have narrowed it down to the following code:

#![feature(test)]

extern crate test;

#[bench]
fn bench(b: &mut test::Bencher) {
    const N: usize = 65536;
    let mut buf = Vec::with_capacity(N);
    b.iter(|| {
        buf.clear();
        for _ in 0..N {
            w(&mut buf, b'[').unwrap();
        }
    });
}

#[inline(never)]
fn w(buf: &mut Vec<u8>, b: u8) -> Result<(), ::std::io::Error> {
    buf.extend_from_slice(&[b]);
    Ok(())
}

On my linux laptop this regressed from 160μs to 220μs, on my mac laptop 215μs to 300μs.

cc @bluss who seems to know about extend_from_slice.

@bluss
Copy link
Member

bluss commented Nov 26, 2016

The TrustedLen PR is in that range, so it's quite possibly that. It changed the implementation of extend_from_slice.

git log --oneline --first-parent 5665bdf..cae6ab1
cae6ab1 Auto merge of #37470 - arthurprs:sip-smaller, r=alexcrichton
0883996 Auto merge of #37427 - nnethercote:opt-IchHasher, r=michaelwoerister
e96b9d2 Auto merge of #37422 - bluss:wrapping-offset, r=alexcrichton
713a360 Auto merge of #37356 - cristicbz:wrapsum, r=alexcrichton
81601cd Auto merge of #37306 - bluss:trusted-len, r=alexcrichton
ccfc38f Auto merge of #37167 - nikomatsakis:jroesch-issue-18937, r=pnkfelix
d2bc30b Auto merge of #37037 - Mark-Simulacrum:stack-error, r=alexcrichton
49c36bf Auto merge of #36306 - nagisa:mir-local-cleanup, r=eddyb
ac919fc Auto merge of #37541 - nikomatsakis:issue-37291, r=brson
f9f45c6 Auto merge of #36993 - nnethercote:obligation, r=nikomatsakis

@bluss bluss added A-libs I-slow Issue: Problems and improvements with respect to performance of generated code. labels Nov 26, 2016
@bluss
Copy link
Member

bluss commented Nov 26, 2016

I don't see much difference, the new code should be almost equivalent. One uses a counted loop and the other a pointer = pointer.offset(1) loop, though.

@bluss
Copy link
Member

bluss commented Nov 26, 2016

Ok just picking a recent rust commit to work off of (127a83d).

Command line:

cargo run --bin json-benchmark  --release --no-default-features --features="lib-json-rust performance file-canada"

With that compiler I get the benchmark result

     Running `target/release/json-benchmark`
                                DOM                STRUCT
======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json          10.5ms     4.3ms

If I put in the old implementation for extend_from_slice, I get this result

                                DOM                STRUCT
======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json          10.4ms     3.4ms

Do you think that could be it? That's the kind of difference? The test case in the original issue is not affected by the change, though.

@dtolnay
Copy link
Member Author

dtolnay commented Nov 26, 2016

That could be it. For me:

$ rustup run nightly-2016-11-03 cargo run --bin json-benchmark --release --no-default-features --features="lib-json-rust performance file-canada" -- -n 256
                                DOM                STRUCT
======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json           9.1ms     3.1ms
$ rustup run nightly-2016-11-06 cargo run --bin json-benchmark --release --no-default-features --features="lib-json-rust performance file-canada" -- -n 256
                                DOM                STRUCT
======= json-rust ======== parse|stringify === parse|stringify ===
data/canada.json           9.1ms     3.9ms

Can you try this case? For me this showed a 70% regression from 995μs to 1685μs.

#![feature(test)]

extern crate serde;
extern crate serde_json;
extern crate test;

#[derive(Clone, Default)]
struct Area(u32);

impl serde::Serialize for Area {
    fn serialize<S>(&self, serializer: &mut S) -> Result<(), S::Error>
        where S: serde::Serializer
    {
        let mut state = try!(serializer.serialize_struct("Area", 1));
        try!(serializer.serialize_struct_elt(&mut state, "blockIds", &[(); 0]));
        serializer.serialize_struct_end(state)
    }
}

#[bench]
fn bench(b: &mut test::Bencher) {
    let areas = vec![Area::default(); 65536];
    let len = serde_json::to_string(&areas).unwrap().len();
    let mut buf = Vec::with_capacity(len);
    b.iter(|| {
        buf.clear();
        serde_json::to_writer(&mut buf, &areas).unwrap();
    });
}

@bluss
Copy link
Member

bluss commented Nov 26, 2016

oh wow.

before test bench ... bench: 2,358,534 ns/iter (+/- 51,417)
after test bench ... bench: 1,641,021 ns/iter (+/- 33,637)

@bluss
Copy link
Member

bluss commented Nov 26, 2016

well something is regressing to a byte-by-byte loop there, and it's definitely .extend(slice.iter().cloned()), I thought SetLenOnDrop had worked around those (we used to have those for extend_from_slice too).

@bluss
Copy link
Member

bluss commented Nov 26, 2016

Ok it's not the same thing.

@bluss
Copy link
Member

bluss commented Nov 27, 2016

We've verified that putting back extend_from_slice's old implementation would work well, but I don't have a reduced test case. Would be ideal if .extend() could be fixed instead of the duplication.

@bluss
Copy link
Member

bluss commented Nov 27, 2016

This is enough to reproduce I think. The .extend() loop is byte-by-byte (albeit unrolled in this case). https://gist.github.com/bluss/61c32daae1bd31d9d7605983df3ace5f#file-extend_from_slice-ll-L146 Rust code is below the llvm ir.

@bluss
Copy link
Member

bluss commented Dec 5, 2016

The codegen issue is #38175, if this issue is resolved by other means (by reverting for example).

bors added a commit that referenced this issue Dec 8, 2016
Specialization for Extend<&T> for vec

Specialize to use copy_from_slice when extending a Vec with &[T] where
T: Copy.

This specialization results in `.clone()` not being called in `extend_from_slice` and `extend` when the element is `Copy`.

Fixes #38021
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants