Skip to content

cleanup array_has #12460

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 2 commits into from
Sep 16, 2024
Merged

cleanup array_has #12460

merged 2 commits into from
Sep 16, 2024

Conversation

samuelcolvin
Copy link
Contributor

@samuelcolvin samuelcolvin commented Sep 13, 2024

Rationale for this change

While working on #12459 I noticed a few things that could do with being cleaned up.

What changes are included in this PR?

  • Rewrite ArrayHas::invoke to be cleaner and easier to understand — logic hasn't changed
  • Rewrite array_has_dispatch_for_scalar to use BooleanArray builder, not a Vec Small tweak to array_has_dispatch_for_scalar

Are these changes tested?

Tests for array_has already exist.

Are there any user-facing changes?

no.

@@ -203,24 +192,26 @@ fn array_has_dispatch_for_scalar<O: OffsetSizeTrait>(
return Ok(Arc::new(BooleanArray::from(vec![Some(false)])));
}
let eq_array = compare_with_eq(values, needle, is_nested)?;
let mut final_contained = vec![None; haystack.len()];
for (i, offset) in offsets.windows(2).enumerate() {
let mut final_contained = BooleanArray::builder(haystack.len());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is using builder faster then From<Vec>?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out what you had was faster, appologies, I had assumed boolean builder existed because it was significantly faster. Reverting.

BooleanArray_builder    time:   [18.503 µs 18.518 µs 18.532 µs]
                        change: [-0.1321% +0.0661% +0.2517%] (p = 0.51 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  7 (7.00%) high mild

BooleanArray_vec        time:   [12.353 µs 12.496 µs 12.653 µs]
                        change: [-26.653% -26.272% -25.814%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 15 outliers among 100 measurements (15.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild
  13 (13.00%) high severe
extern crate criterion;

use arrow::array::BooleanArray;
use criterion::{black_box, criterion_group, criterion_main, Criterion};

fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("BooleanArray_builder", |b| {
        b.iter(|| {
            let mut final_contained = BooleanArray::builder(8192);
            for i in 0..8192 {
                if i % 3 == 0 {
                    final_contained.append_value(true);
                } else if i % 2 == 0 {
                    final_contained.append_value(false);
                } else {
                    final_contained.append_null();
                }
            }
            black_box(final_contained.finish());
        })
    });

    c.bench_function("BooleanArray_vec", |b| {
        b.iter(|| {
            let mut vec = vec![None; 8192];
            for i in 0..8192 {
                if i % 3 == 0 {
                    vec[i] = Some(true);
                } else if i % 2 == 0 {
                    vec[i] = Some(false);
                }
            }
            black_box(BooleanArray::from(vec));
        })
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

Copy link
Contributor

@jayzhan211 jayzhan211 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb
Copy link
Contributor

alamb commented Sep 16, 2024

🚀

@alamb alamb merged commit 25c34f9 into apache:main Sep 16, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Sep 16, 2024

Thanks @samuelcolvin and @jayzhan211

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants