Skip to content

Commit f580155

Browse files
Upgrade Datafusion 40 (#771)
* chore: update datafusion deps * feat: impl ExecutionPlan::static_name() for DatasetExec This required trait method was added upstream [0] and recommends to simply forward to `static_name`. [0]: apache/datafusion#10266 * feat: update first_value and last_value wrappers. Upstream signatures were changed for the new new `AggregateBuilder` api [0]. This simply gets the code to work. We should better incorporate that API into `datafusion-python`. [0] apache/datafusion#10560 * migrate count to UDAF Builtin Count was removed upstream. TBD whether we want to re-implement `count_star` with new API. Ref: apache/datafusion#10893 * migrate approx_percentile_cont, approx_distinct, and approx_median to UDAF Ref: approx_distinct apache/datafusion#10851 Ref: approx_median apache/datafusion#10840 Ref: approx_percentile_cont and _with_weight apache/datafusion#10917 * migrate avg to UDAF Ref: apache/datafusion#10964 * migrage corr to UDAF Ref: apache/datafusion#10884 * migrate grouping to UDAF Ref: apache/datafusion#10906 * add alias `mean` for UDAF `avg` * migrate stddev to UDAF Ref: apache/datafusion#10827 * remove rust alias for stddev The python wrapper now provides stddev_samp alias. * migrage var_pop to UDAF Ref: apache/datafusion#10836 * migrate regr_* functions to UDAF Ref: apache/datafusion#10898 * migrate bitwise functions to UDAF The functions now take a single expression instead of a Vec<_>. Ref: apache/datafusion#10930 * add missing variants for ScalarValue with todo * fix typo in approx_percentile_cont * add distinct arg to count * comment out failing test `approx_percentile_cont` is now returning a DoubleArray instead of an IntArray. This may be a bug upstream; it requires further investigation. * update tests to expect lowercase `sum` in query plans This was changed upstream. Ref: apache/datafusion#10831 * update ScalarType data_type map * add docs dependency pickleshare * re-implement count_star * lint: ruff python lint * lint: rust cargo fmt * include name of window function in error for find_window_fn * refactor `find_window_fn` for debug clarity * search default aggregate functions by both name and aliases The alias list no longer includes the name of the function. Ref: apache/datafusion#10658 * fix markdown in find_window_fn docs * parameterize test_window_functions `first_value` and `last_value` are currently failing and marked as xfail. * add test ids to test_simple_select tests marked xfail * update find_window_fn to search built-ins first The behavior of `first_value` and `last_value` UDAFs currently does not match the built-in behavior. This allowed me to remove `marks=pytest.xfail` from the window tests. * improve first_call and last_call use of the builder API * remove trailing todos * fix examples/substrait.py * chore: remove explicit aliases from functions.rs Ref: #779 * remove `array_fn!` aliases * remove alias rules for `expr_fn_vec!` * remove alias rules from `expr_fn!` macro * remove unnecessary pyo3 var-arg signatures in functions.rs * remove pyo3 signatures that provided defaults for first_value and last_value * parametrize test_string_functions * test regr_ function wrappers Closes #778
1 parent fd6b4df commit f580155

12 files changed

+635
-499
lines changed

Cargo.lock

+36-33
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

+8-8
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717

1818
[package]
1919
name = "datafusion-python"
20-
version = "39.0.0"
20+
version = "40.0.0"
2121
homepage = "https://datafusion.apache.org/python"
2222
repository = "https://github.com/apache/datafusion-python"
2323
authors = ["Apache DataFusion <dev@datafusion.apache.org>"]
@@ -38,13 +38,13 @@ tokio = { version = "1.35", features = ["macros", "rt", "rt-multi-thread", "sync
3838
rand = "0.8"
3939
pyo3 = { version = "0.21", features = ["extension-module", "abi3", "abi3-py38"] }
4040
arrow = { version = "52", feature = ["pyarrow"] }
41-
datafusion = { version = "39.0.0", features = ["pyarrow", "avro", "unicode_expressions"] }
42-
datafusion-common = { version = "39.0.0", features = ["pyarrow"] }
43-
datafusion-expr = "39.0.0"
44-
datafusion-functions-array = "39.0.0"
45-
datafusion-optimizer = "39.0.0"
46-
datafusion-sql = "39.0.0"
47-
datafusion-substrait = { version = "39.0.0", optional = true }
41+
datafusion = { version = "40.0.0", features = ["pyarrow", "avro", "unicode_expressions"] }
42+
datafusion-common = { version = "40.0.0", features = ["pyarrow"] }
43+
datafusion-expr = "40.0.0"
44+
datafusion-functions-array = "40.0.0"
45+
datafusion-optimizer = "40.0.0"
46+
datafusion-sql = "40.0.0"
47+
datafusion-substrait = { version = "40.0.0", optional = true }
4848
prost = "0.12"
4949
prost-types = "0.12"
5050
uuid = { version = "1.9", features = ["v4"] }

docs/requirements.txt

+2-1
Original file line numberDiff line numberDiff line change
@@ -21,4 +21,5 @@ myst-parser
2121
maturin
2222
jinja2
2323
ipython
24-
pandas
24+
pandas
25+
pickleshare

examples/substrait.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -46,4 +46,4 @@
4646

4747
# Back to Substrait Plan just for demonstration purposes
4848
# type(substrait_plan) -> <class 'datafusion.substrait.plan'>
49-
substrait_plan = ss.Producer.to_substrait_plan(df_logical_plan)
49+
substrait_plan = ss.Producer.to_substrait_plan(df_logical_plan, ctx)

0 commit comments

Comments
 (0)