Skip to content

Introduce return_data_type for Aggregate function #7960

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
jayzhan211 opened this issue Oct 28, 2023 · 3 comments
Closed

Introduce return_data_type for Aggregate function #7960

jayzhan211 opened this issue Oct 28, 2023 · 3 comments
Labels
enhancement New feature or request

Comments

@jayzhan211
Copy link
Contributor

Is your feature request related to a problem or challenge?

return_data_type is not the same as the data_type for List case. For example, data_type: List(Int64), return_data_type will be Int64

Ref: https://github.com/apache/arrow-datafusion/pull/7242/files

Describe the solution you'd like

Introduce return_data_type for Sum, and other Aggregate function

Describe alternatives you've considered

No response

Additional context

No response

@jayzhan211 jayzhan211 added the enhancement New feature or request label Oct 28, 2023
@alamb
Copy link
Contributor

alamb commented Oct 29, 2023

I think we should do some more careful thinking of how to handle multi-phase aggregates where the intermediate type is not the same as the output type -- for example SUM(x) for non lists can use the same aggregate (as you can sum partial sums) but COUNT(x) does not -- the first phase must actually COUNT but the second needs to use SUM.

This causes problems for #6937 where we might want to dynamically stop doing the first phase aggregation.

@jayzhan211
Copy link
Contributor Author

jayzhan211 commented Nov 12, 2023

TODO

  • min_max (single-phase)
  • median (two-phase, 1. collect values 2. compute median)
  • count (no need, data type is always i64)
  • avg (had been done)

@jayzhan211
Copy link
Contributor Author

Reopen it if there is any concrete issue to solve

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants