-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Convert approx_distinct to UDAF #10851
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Lordworms
I think overall good but some suggestions.
|
||
pub struct ApproxDistinct { | ||
signature: Signature, | ||
alias: Vec<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ApproxDistinct has no any alias. I think we can just remove this field and use default AggregateUDFImpl::alias
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
@@ -115,6 +115,7 @@ where | |||
|
|||
/// Get the register histogram (each value in register index into | |||
/// the histogram; u32 is enough because we only have 2**14=16384 registers | |||
#[allow(dead_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After moving to function-aggregate
, I think we can remove the original file datafusion/physical-expr/src/aggregate/hyperloglog.rs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
@@ -64,6 +63,7 @@ pub use crate::window::ntile::Ntile; | |||
pub use crate::window::rank::{dense_rank, percent_rank, rank, Rank, RankType}; | |||
pub use crate::window::row_number::RowNumber; | |||
pub use crate::PhysicalSortExpr; | |||
pub use datafusion_functions_aggregate::approx_distinct::ApproxDistinct; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we don't need to public use it.
proto style fix prost fix clippy fix doc fix clippy
I have resolved all the reviews, plz merge this first to avoid an extra conflicts solve, Thanks for your review @goldmedal |
Thanks @Lordworms and @goldmedal |
* Convert approx_distinct to UDAF proto style fix prost fix clippy fix doc fix clippy * refactor code
… UDAF Ref: approx_distinct apache/datafusion#10851 Ref: approx_median apache/datafusion#10840 Ref: approx_percentile_cont and _with_weight apache/datafusion#10917
… UDAF Ref: approx_distinct apache/datafusion#10851 Ref: approx_median apache/datafusion#10840 Ref: approx_percentile_cont and _with_weight apache/datafusion#10917
… UDAF Ref: approx_distinct apache/datafusion#10851 Ref: approx_median apache/datafusion#10840 Ref: approx_percentile_cont and _with_weight apache/datafusion#10917
* chore: update datafusion deps * feat: impl ExecutionPlan::static_name() for DatasetExec This required trait method was added upstream [0] and recommends to simply forward to `static_name`. [0]: apache/datafusion#10266 * feat: update first_value and last_value wrappers. Upstream signatures were changed for the new new `AggregateBuilder` api [0]. This simply gets the code to work. We should better incorporate that API into `datafusion-python`. [0] apache/datafusion#10560 * migrate count to UDAF Builtin Count was removed upstream. TBD whether we want to re-implement `count_star` with new API. Ref: apache/datafusion#10893 * migrate approx_percentile_cont, approx_distinct, and approx_median to UDAF Ref: approx_distinct apache/datafusion#10851 Ref: approx_median apache/datafusion#10840 Ref: approx_percentile_cont and _with_weight apache/datafusion#10917 * migrate avg to UDAF Ref: apache/datafusion#10964 * migrage corr to UDAF Ref: apache/datafusion#10884 * migrate grouping to UDAF Ref: apache/datafusion#10906 * add alias `mean` for UDAF `avg` * migrate stddev to UDAF Ref: apache/datafusion#10827 * remove rust alias for stddev The python wrapper now provides stddev_samp alias. * migrage var_pop to UDAF Ref: apache/datafusion#10836 * migrate regr_* functions to UDAF Ref: apache/datafusion#10898 * migrate bitwise functions to UDAF The functions now take a single expression instead of a Vec<_>. Ref: apache/datafusion#10930 * add missing variants for ScalarValue with todo * fix typo in approx_percentile_cont * add distinct arg to count * comment out failing test `approx_percentile_cont` is now returning a DoubleArray instead of an IntArray. This may be a bug upstream; it requires further investigation. * update tests to expect lowercase `sum` in query plans This was changed upstream. Ref: apache/datafusion#10831 * update ScalarType data_type map * add docs dependency pickleshare * re-implement count_star * lint: ruff python lint * lint: rust cargo fmt * include name of window function in error for find_window_fn * refactor `find_window_fn` for debug clarity * search default aggregate functions by both name and aliases The alias list no longer includes the name of the function. Ref: apache/datafusion#10658 * fix markdown in find_window_fn docs * parameterize test_window_functions `first_value` and `last_value` are currently failing and marked as xfail. * add test ids to test_simple_select tests marked xfail * update find_window_fn to search built-ins first The behavior of `first_value` and `last_value` UDAFs currently does not match the built-in behavior. This allowed me to remove `marks=pytest.xfail` from the window tests. * improve first_call and last_call use of the builder API * remove trailing todos * fix examples/substrait.py * chore: remove explicit aliases from functions.rs Ref: #779 * remove `array_fn!` aliases * remove alias rules for `expr_fn_vec!` * remove alias rules from `expr_fn!` macro * remove unnecessary pyo3 var-arg signatures in functions.rs * remove pyo3 signatures that provided defaults for first_value and last_value * parametrize test_string_functions * test regr_ function wrappers Closes #778
Which issue does this PR close?
Closes #10837
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?