-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Add Upgrade Guide for DataFusion 46.0.0 #14891
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
|
||
### Changes to `invoke()` and `invoke_batch()` deprecated | ||
|
||
We are migrating away from `ScalarUDFImpl::invoke()` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added this section based on the report from @shehabgamin in #14123 (comment)
@shehabgamin any chance you have an example of the kind of error you saw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alamb Yes, users may see an error similar to something like this (depending on which UDF
they are calling .invoke()
or .invoke_batch()
with):
This feature is not implemented: Function concat does not implement invoke but called
Here is an example of resolving the error:
Old Code:
impl ScalarUDFImpl for SparkConcat {
...
fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize) -> Result<ColumnarValue> {
if args
.iter()
.any(|arg| matches!(arg.data_type(), DataType::List(_)))
{
ArrayConcat::new().invoke_batch(args, number_rows)
} else {
ConcatFunc::new().invoke_batch(args, number_rows)
}
}
}
New Code:
impl ScalarUDFImpl for SparkConcat {
...
fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
if args
.args
.iter()
.any(|arg| matches!(arg.data_type(), DataType::List(_)))
{
ArrayConcat::new().invoke_with_args(args)
} else {
ConcatFunc::new().invoke_with_args(args)
}
}
}
Changes:
1.
From:
fn invoke_batch(&self, args: &[ColumnarValue], number_rows: usize)
OR
fn invoke(&self, args: &[ColumnarValue])
To:
fn invoke_with_args(&self, args: ScalarFunctionArgs)
2.
From:
.invoke_batch(args, number_rows)
OR
.invoke(args)
To:
.invoke_with_args(args)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much @shehabgamin -- I have incorporated this content in 9c658c7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woohoo!
This will be definitely helpful for the user's upgrade, thank you @alamb |
Thanks @alamb for proceeding with this. Imho we still need to modify contribution guide to let contributors to maintain the upgration guide. Leaving this to one person is might be a huge burden |
Sounds good . Maybe once we have an example of what an upgrade guide looks like it will be easier to add / point people to an example In general trying to change people's behavior with documentation is a bit diecy in my opinion |
Yeah, example is awesome. The upgrade guide would be totally voluntarily but if anyone feels their changes are breaking and wanna give a downstream user a hint how to migrate they can refer to the example as you propose. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @alamb
After the PR is merged, I'll update changlog PR and make the branch-46. |
Thank you @xudong963 for the approval and @comphead for the review and @shehabgamin for the content assistance! Since this PR affects only https://datafusion.apache.org/ it doesn't need to be backported to the branch-46 - now that I have merged it it it will show up on the website without having to be part of the release tarball Maybe we could put a link to https://datafusion.apache.org/library-user-guide/upgrading.html in release notes / readme |
Which issue does this PR close?
46.0.0
#14123DataSourceExec
for provided datasources, removeParquetExec
,CsvExec
, etc #14224Rationale for this change
As we have discussed in tickets, it would be good to have help users upgrade to new DataFusion versions, especially when major / disruptive API changes like #14224 are included
What changes are included in this PR?
Are these changes tested?
I build the docs manually
Are there any user-facing changes?