Skip to content

Allow to filter null in array_agg #13742

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
rluvaton opened this issue Dec 12, 2024 · 5 comments · Fixed by #15544
Closed

Allow to filter null in array_agg #13742

rluvaton opened this issue Dec 12, 2024 · 5 comments · Fixed by #15544
Labels
enhancement New feature or request

Comments

@rluvaton
Copy link
Contributor

rluvaton commented Dec 12, 2024

Is your feature request related to a problem or challenge?

Yes, I want nulls to be filtered from array_agg when I specify with_ignore_nulls: true in AggregateExprBuilder to have behavior similar to Spark, this should also work when with_distinct is false or true

Describe the solution you'd like

Specifing with_ignore_nulls: true should ignore

Describe alternatives you've considered

  1. Adding a filter for removing nulls to the child before creating the array_agg
  2. Copy the implementation and added in myself as an UDAF

Additional context

This PR might be relevant:

I'm willing to open a pull request

@rluvaton rluvaton added the enhancement New feature or request label Dec 12, 2024
@findepi
Copy link
Member

findepi commented Dec 12, 2024

AFAICT, the with_ignore_nulls: true is currently... ignored?
Given it's part of the "logical plan", it should be obeyed. It's probably part of LP because the LP is half-way between parser's AST and real plan (see also #12604 and cc @alamb).

I'm OK for respecting with_ignore_nulls in the plan, but I would actually consider removing it completely. Simpler things are better. The same functionality is available using AggregateFunction.filter.
What calls for keeping with_ignore_nulls is that aggregate functions -- in general -- skip over null inputs. SQL standard array_agg is one of few exceptions (and some systems like Snowflake and apparently Spark skip nulls for array_agg). Ideally this could be decided by the function itself, so could be a parameter of the ArrayAgg struct.

@jayzhan211
Copy link
Contributor

jayzhan211 commented Dec 13, 2024

I like the idea that we leverage existing filter in AggregateExec for this usecase

/// FILTER (WHERE clause) expression for each aggregate expression
filter_expr: Vec<Option<Arc<dyn PhysicalExpr>>>,

I guess you can create AggregateExec with predicate that only non-nulls goes to array-agg, similar to the following syntax

select array_agg(column1) from table where column1 != NUll;

We could even introduce a general filter parameter to UDAF. Then we have something like array_agg(lit(1)).distinct().filter(col != null).build() 🤔

We have filter already 👍🏻

@alamb
Copy link
Contributor

alamb commented Dec 16, 2024

To be clear, I think the SQL parser supports predicates like ARRAY_AGG(x) IGNORE NULL and we have logical plan support for them.

However, some aggregates (silently) ignore this, see

I think filling out support (or at least making it clear that this syntax is ignored) would be a great idea

@findepi
Copy link
Member

findepi commented Dec 17, 2024

Thank you @alamb. do you have an example of an agg that doesn't ignore them?
I tried find one, but maybe i was searching the wrong way

@alamb
Copy link
Contributor

alamb commented Dec 17, 2024

Thank you @alamb. do you have an example of an agg that doesn't ignore them? I tried find one, but maybe i was searching the wrong way

I think first_value supports nulls / ignore nulls

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants