Skip to content

Support applying parquet bloom filters to StringView columns #12499

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Tracked by #11752
alamb opened this issue Sep 17, 2024 · 2 comments · Fixed by #12503
Closed
Tracked by #11752

Support applying parquet bloom filters to StringView columns #12499

alamb opened this issue Sep 17, 2024 · 2 comments · Fixed by #12503
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Sep 17, 2024

Is your feature request related to a problem or challenge?

Part of #11752

While working to enable StringView in #12092 I found that the columns when read as StringView and BinaryView do not take advantage of Bloom filters.

Specifically this code doesn't handle StringView

ScalarValue::Utf8(Some(v)) => sbbf.check(&v.as_str()),
ScalarValue::Binary(Some(v)) => sbbf.check(v),
ScalarValue::FixedSizeBinary(_size, Some(v)) => sbbf.check(v),
ScalarValue::Boolean(Some(v)) => sbbf.check(v),
ScalarValue::Float64(Some(v)) => sbbf.check(v),
ScalarValue::Float32(Some(v)) => sbbf.check(v),

Describe the solution you'd like

Support applying parquet bloom filters to StringView columns

Describe alternatives you've considered

Basically:

  1. Make the code changes for bloom filters in Enable reading StringViewArray by default from Parquet #12092
  2. Write a test

In terms of testing, I think the easiest thing to do would be to follow the model of the existing tests for Utf8/Binary columns and pass the schema_force_view_types config flag

Additional context

No response

@alamb alamb added the enhancement New feature or request label Sep 17, 2024
@alamb alamb added the good first issue Good for newcomers label Sep 17, 2024
@alamb
Copy link
Contributor Author

alamb commented Sep 17, 2024

I think this is a clearly defined need so marking as good first issue

@my-vegetable-has-exploded
Copy link
Contributor

take

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
2 participants