-
Notifications
You must be signed in to change notification settings - Fork 1.5k
cleanup array_has
#12460
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
cleanup array_has
#12460
Conversation
@@ -203,24 +192,26 @@ fn array_has_dispatch_for_scalar<O: OffsetSizeTrait>( | |||
return Ok(Arc::new(BooleanArray::from(vec![Some(false)]))); | |||
} | |||
let eq_array = compare_with_eq(values, needle, is_nested)?; | |||
let mut final_contained = vec![None; haystack.len()]; | |||
for (i, offset) in offsets.windows(2).enumerate() { | |||
let mut final_contained = BooleanArray::builder(haystack.len()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is using builder faster then From<Vec>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out what you had was faster, appologies, I had assumed boolean builder existed because it was significantly faster. Reverting.
BooleanArray_builder time: [18.503 µs 18.518 µs 18.532 µs]
change: [-0.1321% +0.0661% +0.2517%] (p = 0.51 > 0.05)
No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
1 (1.00%) low severe
2 (2.00%) low mild
7 (7.00%) high mild
BooleanArray_vec time: [12.353 µs 12.496 µs 12.653 µs]
change: [-26.653% -26.272% -25.814%] (p = 0.00 < 0.05)
Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
1 (1.00%) low mild
1 (1.00%) high mild
13 (13.00%) high severe
extern crate criterion;
use arrow::array::BooleanArray;
use criterion::{black_box, criterion_group, criterion_main, Criterion};
fn criterion_benchmark(c: &mut Criterion) {
c.bench_function("BooleanArray_builder", |b| {
b.iter(|| {
let mut final_contained = BooleanArray::builder(8192);
for i in 0..8192 {
if i % 3 == 0 {
final_contained.append_value(true);
} else if i % 2 == 0 {
final_contained.append_value(false);
} else {
final_contained.append_null();
}
}
black_box(final_contained.finish());
})
});
c.bench_function("BooleanArray_vec", |b| {
b.iter(|| {
let mut vec = vec![None; 8192];
for i in 0..8192 {
if i % 3 == 0 {
vec[i] = Some(true);
} else if i % 2 == 0 {
vec[i] = Some(false);
}
}
black_box(BooleanArray::from(vec));
})
});
}
criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
🚀 |
Thanks @samuelcolvin and @jayzhan211 |
Rationale for this change
While working on #12459 I noticed a few things that could do with being cleaned up.
What changes are included in this PR?
ArrayHas::invoke
to be cleaner and easier to understand — logic hasn't changedRewriteSmall tweak toarray_has_dispatch_for_scalar
to useBooleanArray
builder, not aVec
array_has_dispatch_for_scalar
Are these changes tested?
Tests for
array_has
already exist.Are there any user-facing changes?
no.