Skip to content

POC Make BloomFilter application general, add PruningPredicate::contains #8397

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wants to merge 20 commits into from

Conversation

alamb
Copy link
Contributor

@alamb alamb commented Dec 1, 2023

Which issue does this PR close?

POC for #8376

Rationale for this change

See #8376

TLDR is I would like to use PruningPredicate for "bloom filter like" operations downstream in IOx, and thus I would like to refactor the code that applies Bloom Filters today into the PruningPredicate.

What changes are included in this PR?

  1. New PruningStatistics::contains API proposed in Support general pruning based on <col> = 'const' in PruningPredicate #8376
  2. Add LiteralGuarantee for extracting col = <const> predicates, along with many new tests
  3. Rewrite the BloomFilter code to use the new predicate API

TODO

Are these changes tested?

  • New tests for PruningPredicate contains (=, !=, = with OR, IN and NOT IN)
  • Existing tests pass

Are there any user-facing changes?

The PruningPredicate has a new API

Plans for merging

I do not plan to propose merging this PR as is. Instead I plan a series of PRs:

  1. A PR with some new bloom filter tests
  2. A PR that introduces the contains API with tests
  3. A second PR that refactors the the row group pruning code to use it
  4. A third PR that refactors the RowGroupPruning code so it runs on more than one RowGroupMetadatat at a time.

🤔 musing: At the moment, the row group pruning happens in two phases -- first the min/max statistics are applied, and then the bloom filters are applied which means the potentially bloom filters are only fetched for groups that passed the first predicate.

@github-actions github-actions bot added the core Core DataFusion crate label Dec 1, 2023
Rewrite BloomFilterPruningPredicate in terms of BloomFilterPruningPredicate
@alamb alamb force-pushed the alamb/general_bloom_filter branch from 298da7f to 76ee618 Compare December 4, 2023 21:28
@github-actions github-actions bot added the physical-expr Changes to the physical-expr crates label Dec 4, 2023
@alamb
Copy link
Contributor Author

alamb commented Dec 18, 2023

The pieces of this PR are in, so closing this PR and focusing on merging the parts

@alamb alamb closed this Dec 18, 2023
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
core Core DataFusion crate physical-expr Changes to the physical-expr crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant