Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

functions: Remove NullHandling from scalar funcs #14531

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jkosh44
Copy link
Contributor

@jkosh44 jkosh44 commented Feb 6, 2025

3dfce7d added an enum to all scalar functions called NullHandling, with two variants: PassThrough and Propagate. PassThrough would pass through null inputs to the function implementation. Propagate would cause the function to return null if any of the inputs were scalar and null, it would not do anything if an input was an array and null. Function implementors were responsible for handling null array inputs and making sure the behavior was consistent with the scalar null behavior caused by NullHandling.

If the function signature correctly described the accepted types, then the null array input handling would also work for null scalar inputs. However, if the function signature was VariadicAny, then the null array input handling would not work for null scalar inputs. The reason is that when the signature is VariadicAny, null inputs are not properly typed (for example ScalarValue::Null instead of ScalarValue::Int64(None). So it turns out that NullHandling was only useful for compensating for non-descriptive function signatures.

Furthermore, many array functions use a signature of VariadicAny and reject invalid types within the function implementation. This does not work with NullHandling::Propagate, because any null input would skip the function implementation, which would skip the type validation. So, if a function wanted to use NullHandling::Propagate, then they would need to use a descriptive function signature. However, using a descriptive function signature removes the usefulness of NullHandling::Propagate. So as it turns out NullHandling::Propagate is never useful.

For all the reasons stated above, this commit removes the NullHandling enum.

Which issue does this PR close?

Related to #10548

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Physical Expressions labels Feb 6, 2025
@jkosh44 jkosh44 force-pushed the remove-null-handling branch from 3ce6857 to 383a618 Compare February 6, 2025 19:58
@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 6, 2025

Sorry for the spam of PRs related to this issue. It turns out, IMO, that the fix to the null input issue was improving the function signature and the NullHandling enum did not help. Also I think going forward to fix similar issues we should improve the function signatures and NullHandling wouldn't be helpful. The commit message has more details.

3dfce7d added an enum to all scalar
functions called `NullHandling`, with two variants: `PassThrough` and
Propagate`. `PassThrough` would pass through null inputs to the
function implementation. `Propagate` would cause the function to return
null if any of the inputs were scalar and null, it would not do
anything if an input was an array and null. Function implementors were
responsible for handling null array inputs and making sure the behavior
was consistent with the scalar null behavior caused by `NullHandling`.

If the function signature correctly described the accepted types, then
the null array input handling would also work for null scalar inputs.
However, if the function signature was `VariadicAny`, then the null
array input handling would not work for null scalar inputs. The reason
is that when the signature is `VariadicAny`, null inputs are not
properly typed (for example `ScalarValue::Null` instead of
`ScalarValue::Int64(None)`. So it turns out that `NullHandling` was
only useful for compensating for non-descriptive function signatures.

Furthermore, many array functions use a signature of `VariadicAny`
and reject invalid types within the function implementation. This does
not work with `NullHandling::Propagate`, because any null input would
skip the function implementation, which would skip the type validation.
So, if a function wanted to use `NullHandling::Propagate`, then they
would need to use a descriptive function signature. However, using a
descriptive function signature removes the usefulness of
`NullHandling::Propagate`. So as it turns out `NullHandling::Propagate`
is never useful.

For all the reasons stated above, this commit removes the
`NullHandling` enum.
@jkosh44 jkosh44 force-pushed the remove-null-handling branch from 383a618 to 2fd3514 Compare February 6, 2025 20:05
@jkosh44 jkosh44 marked this pull request as ready for review February 6, 2025 20:37
@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 6, 2025

@jayzhan211 I'm curious if you think this is a good idea or not.

@jkosh44
Copy link
Contributor Author

jkosh44 commented Feb 6, 2025

I also have a PoC for how we'd make the array signatures more expressive and more easily fix the null input errors: #14532

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
logical-expr Logical plan and expressions physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant