Skip to content

vec::from_elem with primitives should be as fast as calloc #7136

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
erickt opened this issue Jun 14, 2013 · 13 comments · Fixed by #10918
Closed

vec::from_elem with primitives should be as fast as calloc #7136

erickt opened this issue Jun 14, 2013 · 13 comments · Fixed by #10918

Comments

@erickt
Copy link
Contributor

erickt commented Jun 14, 2013

While @cmr landed a nice optimization of vec::from_elem in #6876, he said that it's still not performing as fast as doing a malloc and a ptr::set_memory. We should figure out why it is not performing as well as it should be and fix it in order to remove the temptation of using the faster unsafe functions.

@erickt
Copy link
Contributor Author

erickt commented Jun 15, 2013

It turns out this is substantially slower than I expected. With this test benchmark:

extern mod extra;

use std::vec;
use std::ptr;

#[bench]
fn bench_from_elem(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let v: ~[u8] = vec::from_elem(1024, 0u8);
    }
}

#[bench]
fn bench_set_memory(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let mut v: ~[u8] = vec::with_capacity(1024);
        unsafe {
            let vp = vec::raw::to_mut_ptr(v);
            ptr::set_memory(vp, 0, 1024);
            vec::raw::set_len(&mut v, 1024);
        }
    }
}

fn main() {}

I'm getting these results:

running 2 tests
test bench_from_elem ... bench: 16351 ns/iter (+/- 192)
test bench_set_memory ... bench: 384 ns/iter (+/- 6)

This is related to #6623 and #7137.

@erickt
Copy link
Contributor Author

erickt commented Jun 15, 2013

@huonw mentioned in irc that with -O the difference is:

test bench_from_elem ... bench: 799 ns/iter (+/- 0)
test bench_set_memory ... bench: 200 ns/iter (+/- 0)

Which is much better, but still not great.

@emberian
Copy link
Member

memset "cheats" with sse. from_elem spends a lot of time in move_val_init without optimizations so there's a bunch of unnecessary function calls. With optimizations, the thing that's killing it is the copies.

@thestinger
Copy link
Contributor

LLVM knows how to optimize loops to the same code as memcpy/memmove/memset though, as long as you generate good IR.

http://llvm.org/docs/doxygen/html/LoopIdiomRecognize_8cpp_source.html

@luqmana
Copy link
Member

luqmana commented Jun 15, 2013

I added a bench for the literal vec repeat syntax as well:

#[bench]
fn bench_vec_repeat(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let v: ~[u8] = ~[0u8, ..1024];
    }
}

Without optimizations:

running 3 tests
test bench_from_elem ... bench: 21803 ns/iter (+/- 597)
test bench_set_memory ... bench: 381 ns/iter (+/- 6)
test bench_vec_repeat ... bench: 2125 ns/iter (+/- 12)

With optimizations (-O):

running 3 tests
test bench_from_elem ... bench: 752 ns/iter (+/- 0)
test bench_set_memory ... bench: 183 ns/iter (+/- 0)
test bench_vec_repeat ... bench: 90 ns/iter (+/- 3)

Looking at the optimized IR, both set_memory and vec_repeat become a memset but set_memory seems to have a lot more overhead. from_elem does not become a memset hence the bad results.

@thestinger
Copy link
Contributor

@erickt: this gets a lot better with the optimization passes from #7466, from ~3x as slow to ~2x as slow

Before (opt-level=2):

test bench_from_elem ... bench: 986 ns/iter (+/- 3)
test bench_set_memory ... bench: 343 ns/iter (+/- 3)
test bench_vec_repeat ... bench: 175 ns/iter (+/- 3)

After (opt-level=2)

test bench_from_elem ... bench: 667 ns/iter (+/- 0)
test bench_set_memory ... bench: 343 ns/iter (+/- 0)
test bench_vec_repeat ... bench: 178 ns/iter (+/- 0)

@thestinger
Copy link
Contributor

@luqmana: the problem is with the code bloat from the surrounding code, rather than set_memory itself - I think we actually produce some IR there with undefined behaviour

@thestinger
Copy link
Contributor

#7682 greatly speeds up with_capacity, fixing half of the issue

Before:

test bench_from_elem ... bench: 683 ns/iter (+/- 3)
test bench_set_memory ... bench: 334 ns/iter (+/- 3)
test bench_vec_repeat ... bench: 176 ns/iter (+/- 0)

After:

test bench_from_elem ... bench: 461 ns/iter (+/- 0)
test bench_set_memory ... bench: 143 ns/iter (+/- 0)
test bench_vec_repeat ... bench: 148 ns/iter (+/- 0)

@thestinger
Copy link
Contributor

The performance of all 3 has improved, but from_elem got slower relative to the others:

test bench_from_elem ... bench: 409 ns/iter (+/- 4)
test bench_set_memory ... bench: 83 ns/iter (+/- 1)
test bench_vec_repeat ... bench: 84 ns/iter (+/- 3)

@thestinger
Copy link
Contributor

It looks like the remaining issue is our pointer arithmetic being slow. The offset and mut_offset functions compile to conversions to and from integers, rather than actual pointer arithmetic with the stricter semantics. I think we need to make the + and - implementations for pointers a compiler feature.

bors added a commit that referenced this issue Jul 30, 2013
Closes #8118, #7136

~~~rust
extern mod extra;

use std::vec;
use std::ptr;

fn bench_from_elem(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let v: ~[u8] = vec::from_elem(1024, 0u8);
    }
}

fn bench_set_memory(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let mut v: ~[u8] = vec::with_capacity(1024);
        unsafe {
            let vp = vec::raw::to_mut_ptr(v);
            ptr::set_memory(vp, 0, 1024);
            vec::raw::set_len(&mut v, 1024);
        }
    }
}

fn bench_vec_repeat(b: &mut extra::test::BenchHarness) {
    do b.iter {
        let v: ~[u8] = ~[0u8, ..1024];
    }
}
~~~

Before:

    test bench_from_elem ... bench: 415 ns/iter (+/- 17)
    test bench_set_memory ... bench: 85 ns/iter (+/- 4)
    test bench_vec_repeat ... bench: 83 ns/iter (+/- 3)

After:

    test bench_from_elem ... bench: 84 ns/iter (+/- 2)
    test bench_set_memory ... bench: 84 ns/iter (+/- 5)
    test bench_vec_repeat ... bench: 84 ns/iter (+/- 3)
@thestinger
Copy link
Contributor

Fixed by #8121.

test bench_from_elem ... bench: 84 ns/iter (+/- 1)
test bench_set_memory ... bench: 84 ns/iter (+/- 3)
test bench_vec_repeat ... bench: 84 ns/iter (+/- 2)

@eddyb
Copy link
Member

eddyb commented Dec 10, 2013

This has regressed since that fix, @cmr is attempting a bisect atm.

Update: bisect finished, #8780 is to blame. This loop isn't optimized. Maybe the Finallyalizer drop method needs to be inlined for further optimizations.

@huonw huonw reopened this Dec 10, 2013
@emberian
Copy link
Member

Note that set_memory has regressed by ~16ns, cause is #9933 (df187a0)

@bors bors closed this as completed in 09bf5de Dec 14, 2013
flip1995 pushed a commit to flip1995/rust that referenced this issue Apr 27, 2021
…alid_sugg_macro_expansion, r=llogiq

manual_unwrap_or: fix invalid code suggestion, due to macro expansion

fixes rust-lang#6965

changelog: fix invalid code suggestion in `manual_unwrap_or` lint, due to macro expansion
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants