Skip to content

refactor!: consistent null handling in coercible signatures #15404

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alan910127
Copy link
Contributor

@alan910127 alan910127 commented Mar 25, 2025

Which issue does this PR close?

Rationale for this change

Currently, there is a special case for null handling in TypeSignatureClass::matches_native_type, which causes inconsistencies and issues between TypeSignatureClass::matches_native_type and TypeSignatureClass::default_casted_type (it will hit the else match arm which returns an error) when self is not a TypeSignatureClass::Native.

What changes are included in this PR?

  1. Remove the special handling of null case in TypeSignatureClass::matches_native_type
  2. Add TypeSignatureClass::Native(logical_null()) where the argument is nullable and it was not originally supplied.

Are these changes tested?

These changes should be covered in existing test.

Are there any user-facing changes?

Yes, if the user rely on the special null case handling behavior of 'TypeSignatureClass::Native, they will need to transform the usage to Coercible::new_implicitand add aTypeSignatureClass::Native(logical_null())`.

@github-actions github-actions bot added logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Mar 25, 2025
Comment on lines +739 to +743
repeat Int64 2 IN 3
repeat LargeUtf8 1 IN 1
repeat LargeUtf8 1 OUT 1
repeat Null 1 IN 3
repeat Null 1 OUT 3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why this was changed, as there seems to already be a test case for SELECT repeat(NULL, 4) in the original version, so the information schema shouldn't need to be modified?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense? Since it was previously accepted by the matches_native_type without needing to be listed as one of the possible signatures. Now we have added to the possible cartesian products per here.

Comment on lines +739 to +743
repeat Int64 2 IN 3
repeat LargeUtf8 1 IN 1
repeat LargeUtf8 1 OUT 1
repeat Null 1 IN 3
repeat Null 1 OUT 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes sense? Since it was previously accepted by the matches_native_type without needing to be listed as one of the possible signatures. Now we have added to the possible cartesian products per here.

@alan910127 alan910127 force-pushed the refactor/coercible-signature-consistent-null-handling branch from e2dba0b to 680383b Compare March 30, 2025 12:43
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alan910127 and @wiedld

I am worried about the potential subtle semantic issues in downstream projects that would result from this change (see comment). I worry about such changes especially when the benefits of the change are nicer internal datafusion code, but no benefit to downstream crates, although they will pay the cost

I know @shehabgamin was thinking about coercion recently -- maybe he would have some time to review this PR / comment about its potential implications and how it would fit into a broader story about coercion improvements

@@ -263,10 +263,6 @@ impl TypeSignatureClass {
self: &TypeSignatureClass,
logical_type: &NativeType,
) -> bool {
if logical_type == &NativeType::Null {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says

Are there any user-facing changes?
No.

However, doesn't this change means that existing TypeSignature::Coercable that handle Nulls will need to add Coercion::new_implicit to retain the same behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's right, I forgot about that. PR description updated.

@alan910127 alan910127 changed the title refactor: consistent null handling in coercible signatures refactor!: consistent null handling in coercible signatures Apr 7, 2025
Copy link
Contributor

@shehabgamin shehabgamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late reply, busy week. LGTM!

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @alan910127 @wiedld and @shehabgamin

Can you please add a brief note in the upgrade guide for 47.0.0 (eta next week) in https://github.com/apache/datafusion/blob/main/docs/source/library-user-guide/upgrading.md

About what a user will need to do when upgrading to 47.0.0? Specifically,

  1. What kind of UDFs will need to be updated
  2. What changes need to be done? (you could use examples from this PR)

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Apr 14, 2025
@alan910127
Copy link
Contributor Author

Hi @alamb, I’ve updated upgrading.md. Since this is my first PR with an API change, I’m not 100% sure I did everything right. Would appreciate it if you could take a look and let me know if anything needs adjusting. Thanks!

@alamb
Copy link
Contributor

alamb commented Apr 16, 2025

Thanks @alan910127 -- I'll check it out in a few days

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Improvements or additions to documentation functions Changes to functions implementation logical-expr Logical plan and expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle Null type in TypeSignatureClass more consistently
4 participants