Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

[Spark] Restrict partition-like data filters to whitelist of known-good expressions #3872

Merged
merged 5 commits into from
Dec 2, 2024

Conversation

chirag-s-db
Copy link
Contributor

@chirag-s-db chirag-s-db commented Nov 12, 2024

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Currently, we try to rewrite any arbitrary expression as partition-like. To avoid having to repeatedly remove known-bad expressions, start with a whitelist (to be expanded) of known-good expressions that can safely be rewritten.

This change will fix an existing issue where partition-like filters are generated for a non-skipping eligible column. This partition-like filter will throw an analysis exception because these referenced columns aren't found in the stats. This issue was originally missed (and is a difference in behavior vs. partition filters) because partitioning isn't allowed on non-atomic types (or string types), so we missed adding this additional match.

How was this patch tested?

See test changes.

Does this PR introduce any user-facing changes?

No.

@chirag-s-db chirag-s-db changed the title [Spark] Don't apply partition-like data filters to ineligible columns [Spark] Restrict partition-like data filters to whitelist of known-good expressions Nov 22, 2024
@chirag-s-db
Copy link
Contributor Author

@scovich Could you take a look at this PR? Thanks!

Copy link
Collaborator

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find, glad it first surfaced as a query error instead of an incorrect result!

@scottsand-db scottsand-db merged commit 81f27b3 into delta-io:master Dec 2, 2024
16 of 19 checks passed
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants