Skip to content

Support for order sensitive NTH_VALUE aggregation, make reverse ARRAY_AGG more efficient #8841

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Merged
merged 32 commits into from
Jan 16, 2024
Merged

Support for order sensitive NTH_VALUE aggregation, make reverse ARRAY_AGG more efficient #8841

merged 32 commits into from
Jan 16, 2024

Conversation

mustafasrepo
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

Currently, we have FIRST_VALUE, LAST_VALUE, ARRAY_AGG order sensitive aggregations functions. Similar to the window counter part one may need to index entries other than first and last. With NTH_VALUE aggregation function one can choose arbitrary indices. With this PR one can write following query, to get second row in the group.

SELECT a, b, NTH_VALUE(c, 2)
FROM multiple_ordered_table
GROUP BY a, b
ORDER BY a, b;

As a workaround following query can be used to produce desired output of the query above.

SELECT a, b, ARRAY_AGG(c)[2]
FROM multiple_ordered_table
GROUP BY a, b
ORDER BY a, b;

However, with dedicated aggregator, first query will use less memory.

What changes are included in this PR?

This PR adds NTH_VALUE aggregator support.

Are these changes tested?

Yes,

Are there any user-facing changes?

@github-actions github-actions bot added logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt) labels Jan 12, 2024
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR and it looks good except one issue about clones during merge_batch operations. There seems to be an inefficiency there which is not introduced by this PR, but I discovered it during review. I will consult with @mustafasrepo tomorrow to discuss possible fixes.

@ozankabak ozankabak changed the title ORDER sensitive NTH_VALUE Aggregation support Support for order sensitive NTH_VALUE aggregation, make reverse ARRAY_AGG more efficient Jan 16, 2024
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed this PR thoroughly again and the inefficiency I found in my first review is resolved -- we both have faster ARRAY_AGGs now, and also have an efficient NTH_VALUE.

@ozankabak ozankabak merged commit 8cf1abb into apache:main Jan 16, 2024
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
logical-expr Logical plan and expressions physical-expr Changes to the physical-expr crates sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants