-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Release DataFusion 47.0.0
(April 2025)
#15072
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
@alamb, I'll also be in charge of this release. |
@XiangpengHao also offered to test with the parquet viewer prior to 47: #15102 (comment) |
That's great, added it to release steps |
I feel like this may be important enough to try to get into the release. Does anyone else have thoughts? |
Seems reasonable to me -- I have added it to the "good to get in " list |
Update: wrong ticket (I was looking for 46.0.0 |
The PR #15266 has significantly improved performance, so I added it to the blog section. |
@alamb I think we can start testing the 47.0.0 in the second week of April and begin the release process at the end of that week. What do you think? |
I think it sounds like a great idea -- thank you @xudong963 For your planning purposes I will be away the week of April 21 -- so perhaps we can start testing a week earlier (week of April 7 so we have time to complete / fix issues prior to April 14) |
Happy to test whenever! |
That sounds good! |
Would really appreciate if could add the following PR to the release as well: |
Sure, I added it. Given that there're two approvals, I think it'll be included in DF47, and thanks for your fix |
Hey guys, happy new week, let's start testing the incoming DF47 this week! 🚀 |
Makes sense. Thanks @xudong963 |
We have started testing Comet with the latest DF from main. I added a link to the Comet PR in this PR's description. |
I have tested the Parquet viewer with the latest main and found no problems. But I hit a TPC-H panic when running on LiquidCache: XiangpengHao/liquid-cache#158 I'm working on digging into the root cause.. Other breaking changes I observed:
|
Thank you @XiangpengHao , I added the breaking changes that you mentioned to the summary of the issue. |
I tested the latest DF against our tests - the Substrait consumer is broken when it comes to renaming Struct fields' insides, due to #15239 (comment). I'll try to get a fix up. Edit: fix here #15634 |
I think @andygrove filed a ticket for this one I didn't fully follow the discussion -- but it seems like that issue has been closed |
I've read the discussion now and I think I'm in agreement that it's not an actual regression since the aggregation has no deterministic outcome without ordering assigned. |
My only remaining question is if we want to upgrade arrow in this release as well The upgrade PR is here: Since it also upgrades object_store and pyo3 it is somewhat more disruptive. |
+1 for upgrading all the dependencies |
I am also +1 for upgrading the dependencies (for selfish reasons; we are waiting on an arrow feature to help with INT96 timestamps in Parquet) |
Thanks @jayzhan211 for the approval and for the discussion. I'll plan to merge #15466 tomorrow then unless we want to discuss it further. |
Ok, I just merged #15466 / upgrade to dependencies (arrow/object_store/parquet) 47.0.0 I don't know of anything else we are now waiting on for this release. I suggest we make the release notes PR and generate a release candidate BTW I will be offline for about a week starting this Friday April 18 or Saturday so I likely won't be able to help with the release until I return. Hopefully another PMC member can do the final approval / release to crates.io if it isn't ready before I leave. |
|
Closes #1037 ### Change list - Bump `arrow` to `55` and `parquet` to `55` - Temporarily deactivates the `datafusion` integration until datafusion publishes its version `47` (apache/datafusion#15072), so that we can progress with the `arrow` 55 upgrade now. - Update JS and Python APIs for latest `parquet`. - Means we no longer need an initial `HEAD` request for Parquet files before reading metadata.
Thanks for putting this together! If we could additionally get #14412 in, that would be awesome 🙏 |
@alamb It seems there are still one or two PRs that want to be included, so how about making release notes tomorrow(UTC+8, after I get up) |
Sorry @gabotechs -- I just merged that one!
SOunds like a good plan. I'll take a pass through the outstanding PRs again to see if there is anything else we can/should merge Thank you all |
I just merged the version + changelog PR from @xudong963 I also created a @xudong963, given I think it is late in your timezone, I'll plan to make an RC in a few hours unless you let me know otherwise. |
I have made a release candidate and started a voting thread: https://lists.apache.org/thread/zrq9x9gf51r8b6m9qokf2q75kh251rm6 |
Note that I will be away starting April 18, and so likely can not complete the vote / release process until April 26. @andygrove would it be possible for you to complete the voting process / publish the release to crates.io for this release? |
Yes, I would be happy to do that. I will be offline on Saturday, but can take care of it on Sunday or Monday. |
Awesome -- we have a plan! I hope to work on the upgrade guide and maybe even a blog post about this release, but we'll see if I have the time |
Here is a draft upgrade guide: |
I filed the following ticket for the next release: |
DataFusion 47 is on crates.io https://crates.io/crates/datafusion/47.0.0 So closing this one down |
Getting ready for the [datafusion 47 release](apache/datafusion#15072). ## Current issues - [x] apache/datafusion#15072 - [x] duckdb-rs depends on arrow 54, opened a PR to fix - duckdb/duckdb-rs#496. - [ ] object-store 0.12 has a regression on Azure, not sure what's the priority here but shouldn't be too hard to find the root cause if we care. apache/arrow-rs-object-store#320
Is your feature request related to a problem or challenge?
Tracking ticket for next release, also a place to track desired inclusions
Previous release will be https://crates.io/crates/datafusion/45.0.0 (likely Feb 1, 2025) December 31, 2024 so next major release would be around March 1, 2025
Steps:
48.0.0
(June 2025) #15771Prior release tickets:
45.0.0
: Release DataFusion45.0.0
#1400846.0.0
: Release DataFusion46.0.0
#14123Changes to add to upgrade guide
These PRs made changes that deserve a mention in the upgrade guide
Int64
vsUInt64
, etc) #15341FileGroup
structure forVec<PartitionedFile>
#15379downcast_to_source
method forDataSourceExec
#15416version <= 40
#15027Features to mention in the blog (if they make it)
SQL EXPLAIN
Tree Rendering #14914VARCHAR
fromUtf8
toUtf8View
#15096JoinSetTracer
trait for tracing context propagation in spawned tasks #14547first_value
by implementing specialGroupsAccumulator
#15266Bugs that would be good to fix
panic
when evaluating trivial WHERE with a CTE #15386PartitionedFile
andFileGroup
statistics should be inexact/recomputed #15539last_value
functionality #15676Community Wishlist
Date32
to string given timestamp specifiers #15361date
totimestamp
with tz #14638statistics_by_partition
API toExecutionPlan
#15495@alamb 's wishlist
SQL EXPLAIN
Tree Rendering #14914tree
explain by default #15343The text was updated successfully, but these errors were encountered: