-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add example of using PruningPredicate
to datafusion-examples
#9183
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
// File 2: `x = 5 AND y = 10` can never evaluate to true because y | ||
// has only the value of 7. Thus this file can be skipped. | ||
false, | ||
// File 3: `x = 5 AND y = 10` can never evaluate to true because x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @appletreeisyellow here is an actual example showing that the pruning predicate does the right thing with unknown column values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File 3 example makes sense to me 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what the result will be for a file 4 like:
File 4: x has values between 4
and 6
nothing is known about the value of y
Same the predicate x = 5 AND y = 10
, my understanding is that it will evaluate to true.
x = 5 AND y = 10
--> true AND null
--> null
Since y is unknown, so there is a possibility that y is 10
in this file / partition / row group of data. Thus this file can not be skipped and the result is true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same the predicate x = 5 AND y = 10, my understanding is that it will evaluate to true.
Yes, this is my understanding too (that the PruningPredicate
will return true
for this container)
Since y is unknown, so there is a possibility that y is 10 in this file / partition / row group of data. Thus this file can not be skipped and the result is true
Yes, that is my understanding as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice @alamb I love reviewing such docs as it gives more understanding
There likely a typo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for adding examples @alamb. Super helpful! I left a question for a new example and a suggestion
// File 2: `x = 5 AND y = 10` can never evaluate to true because y | ||
// has only the value of 7. Thus this file can be skipped. | ||
false, | ||
// File 3: `x = 5 AND y = 10` can never evaluate to true because x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File 3 example makes sense to me 👍
// Note, returning null means the value isn't known, NOT | ||
// that we know the entire column is null. | ||
(None, None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That probably looks familiar :)
// File 2: `x = 5 AND y = 10` can never evaluate to true because y | ||
// has only the value of 7. Thus this file can be skipped. | ||
false, | ||
// File 3: `x = 5 AND y = 10` can never evaluate to true because x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious what the result will be for a file 4 like:
File 4: x has values between 4
and 6
nothing is known about the value of y
Same the predicate x = 5 AND y = 10
, my understanding is that it will evaluate to true.
x = 5 AND y = 10
--> true AND null
--> null
Since y is unknown, so there is a possibility that y is 10
in this file / partition / row group of data. Thus this file can not be skipped and the result is true
Co-authored-by: Chunchun Ye <14298407+appletreeisyellow@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm thanks @alamb
Which issue does this PR close?
Part of #7013
Related to #7869 and #9171
Rationale for this change
What changes are included in this PR?
pruning.rs
example to datafusion-examples with an annotated guide to using `PruningPredicatePruningPredicate
API docsAre these changes tested?
Yes, as part of CI
Are there any user-facing changes?
A new example, no code changes