`Vec::retain()` is significantly slower than `into_iter().filter().collect()` #91497

kageru · 2021-12-03T17:54:36Z

I noticed today that using Vec::retain is much slower than filtering and allocating a new Vector.
I realize the former is probably more memory efficient, but I still found it surprising that it would be that much slower (or really slower at all).

When testing with this code:

#![feature(test)]
extern crate test;

fn main() {
    let xs: Vec<i32> = (0..1000).collect();
    assert_eq!(even_with_retain(xs.clone()), even_with_filter(xs.clone()));
}

pub fn even_with_retain(mut xs: Vec<i32>) -> Vec<i32> {
    xs.retain(|x| x & 1 == 0);
    xs
}

pub fn even_with_filter(xs: Vec<i32>) -> Vec<i32> {
    xs.into_iter().filter(|x| x & 1 == 0).collect()
}

#[bench]
fn bench_retain(b: &mut test::Bencher) {
    let xs: Vec<i32> = (0..1000).collect();
    b.iter(|| assert_eq!(even_with_retain(test::black_box(xs.clone())).len(), 500));
}

#[bench]
fn bench_filter_collect(b: &mut test::Bencher) {
    let xs: Vec<i32> = (0..1000).collect();
    b.iter(|| assert_eq!(even_with_filter(test::black_box(xs.clone())).len(), 500));
}

on 1.59.0-nightly (48a5999 2021-12-01), I get these benchmark results:

test bench_filter_collect ... bench:         383 ns/iter (+/- 4)
test bench_retain         ... bench:       1,891 ns/iter (+/- 17)

on a Ryzen 5900X running Linux. Testing on a different machine (Xeon E3-1271 v3), I get similar numbers:

test bench_filter_collect ... bench:         498 ns/iter (+/- 29)
test bench_retain         ... bench:       1,800 ns/iter (+/- 44)

Vec::retain seemed like the obvious choice to me, so it being slower is either a bug or should be documented somewhere.

Godbolt

The text was updated successfully, but these errors were encountered:

BluBb-mADe · 2021-12-03T18:07:28Z

i7 8700k windows 10
default configuration:

test bench_filter_collect ... bench:       1,038 ns/iter (+/- 48)
test bench_retain         ... bench:       1,316 ns/iter (+/- 19)

with lto = true

test bench_filter_collect ... bench:         360 ns/iter (+/- 14)
test bench_retain         ... bench:       1,335 ns/iter (+/- 46)

rustc 1.59.0-nightly (acbe444)

kageru · 2021-12-03T19:19:45Z

Setting lto = true makes the disparity even bigger for me:

test bench_filter_collect ... bench:         199 ns/iter (+/- 6)
test bench_retain         ... bench:       1,878 ns/iter (+/- 26)

the8472 · 2021-12-03T21:25:30Z

It looks like retain is optimized (#81126) for small or large probabilities of keeping elements, i.e. long runs of elements that will be either kept or deleted. into_iter().filter().collect() on the other hand just reads elements one by one, filters them and writes out those that passed, i.e. it does not care about runs.

Since the test-case here drops every second element it's the worst possible case for retain, possibly even worse than random selection (since random clusters may have runs).

Edit: Hrm, the benchmarks look like they're optimizing for high/low retain probabilities; but the code doesn't, it still moves one element at a time when previous ones have been dropped.

hkratz · 2021-12-03T21:49:21Z

~~There is something weird going, because if you put a copy of the current retain() implementation in the source file it is much faster:~~

test bench_filter_collect ... bench:         307 ns/iter (+/- 12)
test bench_my_retain      ... bench:         431 ns/iter (+/- 3)
test bench_retain         ... bench:       1,999 ns/iter (+/- 7)

Godbolt

Edit: Doh, but at least now we now that #88060 caused some major slowdown.

the8472 · 2021-12-03T21:58:25Z

You're not copying the most recent version of retain. process_one has gained a const bool parameter in #88060 and thus gets instantiated in two flavors.

rust/library/alloc/src/vec/mod.rs

Lines 1524 to 1565 in ff2439b

    
           #[inline(always)] 
        
           fn process_one<F, T, A: Allocator, const DELETED: bool>( 
        
               f: &mut F, 
        
               g: &mut BackshiftOnDrop<'_, T, A>, 
        
           ) -> bool 
        
           where 
        
               F: FnMut(&mut T) -> bool, 
        
           { 
        
               // SAFETY: Unchecked element must be valid. 
        
               let cur = unsafe { &mut *g.v.as_mut_ptr().add(g.processed_len) }; 
        
               if !f(cur) { 
        
                   // Advance early to avoid double drop if `drop_in_place` panicked. 
        
                   g.processed_len += 1; 
        
                   g.deleted_cnt += 1; 
        
                   // SAFETY: We never touch this element again after dropped. 
        
                   unsafe { ptr::drop_in_place(cur) }; 
        
                   // We already advanced the counter. 
        
                   return false; 
        
               } 
        
               if DELETED { 
        
                   // SAFETY: `deleted_cnt` > 0, so the hole slot must not overlap with current element. 
        
                   // We use copy for move, and never touch this element again. 
        
                   unsafe { 
        
                       let hole_slot = g.v.as_mut_ptr().add(g.processed_len - g.deleted_cnt); 
        
                       ptr::copy_nonoverlapping(cur, hole_slot, 1); 
        
                   } 
        
               } 
        
               g.processed_len += 1; 
        
               return true; 
        
           } 
        
           // Stage 1: Nothing was deleted. 
        
           while g.processed_len != original_len { 
        
               if !process_one::<F, T, A, false>(&mut f, &mut g) { 
        
                   break; 
        
               } 
        
           } 
        
           // Stage 2: Some elements were deleted. 
        
           while g.processed_len != original_len { 
        
               process_one::<F, T, A, true>(&mut f, &mut g); 
        
           }

BluBb-mADe · 2021-12-03T22:15:08Z

I tried a few scenarios that should heavily favor retain but the effect remains.
|x| *x == 1

test bench_filter_collect ... bench:         284 ns/iter (+/- 18)
test bench_retain         ... bench:       1,338 ns/iter (+/- 55)

|x| *x != 1

test bench_filter_collect ... bench:         413 ns/iter (+/- 130)
test bench_retain         ... bench:       1,369 ns/iter (+/- 28)

|x| *x == 500

test bench_filter_collect ... bench:         289 ns/iter (+/- 15)
test bench_retain         ... bench:       1,332 ns/iter (+/- 54)

|x| *x >= 500

test bench_filter_collect ... bench:         439 ns/iter (+/- 80)
test bench_retain         ... bench:       1,379 ns/iter (+/- 185)

i7 8700k w10 rustc 1.59.0-nightly (acbe444)

the8472 · 2021-12-04T13:07:40Z

I'll take a stab at this. @rustbot claim

the8472 · 2021-12-04T15:42:53Z

Submitted #91527, if anyone wants to confirm my benchmark results.

rustbot assigned the8472 Dec 4, 2021

the8472 mentioned this issue Dec 4, 2021

Optimize vec::retain performance #91527

Merged

bors closed this as completed in a090c86 Dec 16, 2021

vacuus mentioned this issue Dec 27, 2021

rustdoc: Remove implicit collect in simplify::where_clauses #92296

Closed

sh3ll3x3c mentioned this issue Nov 25, 2022

AV-50 (AV-77) - Fetch missing cells from DHT availproject/avail-core#13

Merged

dillonrg mentioned this issue Aug 28, 2024

Improve NotFound Error Accuracy in key_history facebook/akd#452

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`Vec::retain()` is significantly slower than `into_iter().filter().collect()` #91497

`Vec::retain()` is significantly slower than `into_iter().filter().collect()` #91497

kageru commented Dec 3, 2021 •

edited by rustbot

Loading

BluBb-mADe commented Dec 3, 2021 •

edited

Loading

Uh oh!

kageru commented Dec 3, 2021

Uh oh!

the8472 commented Dec 3, 2021 •

edited

Loading

Uh oh!

hkratz commented Dec 3, 2021 •

edited

Loading

Uh oh!

the8472 commented Dec 3, 2021

Uh oh!

BluBb-mADe commented Dec 3, 2021 •

edited

Loading

Uh oh!

the8472 commented Dec 4, 2021

Uh oh!

the8472 commented Dec 4, 2021

Uh oh!

Vec::retain() is significantly slower than into_iter().filter().collect() #91497

Vec::retain() is significantly slower than into_iter().filter().collect() #91497

Comments

kageru commented Dec 3, 2021 • edited by rustbot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

BluBb-mADe commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kageru commented Dec 3, 2021

Uh oh!

the8472 commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hkratz commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the8472 commented Dec 3, 2021

Uh oh!

BluBb-mADe commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

the8472 commented Dec 4, 2021

Uh oh!

the8472 commented Dec 4, 2021

Uh oh!

`Vec::retain()` is significantly slower than `into_iter().filter().collect()` #91497

`Vec::retain()` is significantly slower than `into_iter().filter().collect()` #91497

kageru commented Dec 3, 2021 •

edited by rustbot

Loading

BluBb-mADe commented Dec 3, 2021 •

edited

Loading

the8472 commented Dec 3, 2021 •

edited

Loading

hkratz commented Dec 3, 2021 •

edited

Loading

BluBb-mADe commented Dec 3, 2021 •

edited

Loading