-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Memory is coupled to group by
cardinality, even when the aggregate output is truncated by a limit
clause
#7191
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
Perhaps we could add a redact group API to the new row accumulators, this would allow using them for this as well as for window functions |
I struggled with this for a bit. Originally I rejected using I don't think we'd want to always evict groups, because we might not even need to add them in the first place if the value being aggregated is less/greater than the min/max of the priority queue - so it would be a no-op. |
Also, I think usually this optimization would be applied for single terms |
Yeah, I think you need both a priority queue to work out which groups to keep, along with a HashMap to work out which rows belong to which groups. I can't think of an obvious way to avoid this.
I was envisaging something like adding support to the Or something to that effect, just spitballing here. I really want to get Window functions using |
Interesting... I thought we were going the other way, due to this comment. |
By scalar, I am referring to the ones based around ScalarValue, i.e. https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.Accumulator.html |
Is your feature request related to a problem or challenge?
Currently, there is only one Aggregation:
GroupedHashAggregateStream
. It does a lovely job, but it allocates memory for every uniquegroup by
value.For large datasets, this can cause OOM errors, even if the very next operation is a
sort by max(x) limit y
.Describe the solution you'd like
I would like to add a
GroupedAggregateStream
based on aPriorityQueue
of grouped values that can be used instead ofGroupedHashAggregateStream
under the specific conditions above, so that Top K queries work even on datasets with cardinality larger than available memory.Describe alternatives you've considered
A more generalized implementation where we:
emit
ing rows in a stream as the aggregate for each group is computedTopKExec
node that is only responsible for doing the top K operationUnfortunately, despite being more general, I'm told that this approach will still OOM in our case.
Additional context
Please see the following similar (but not same) tickets for related top K issues:
ROW_NUMBER < 5
/ TopK #6899The text was updated successfully, but these errors were encountered: