Optimization: Parallelize finalize state of aggregation #4547

anuchak · 2024-11-19T00:25:59Z

Description

Currently, the finalizeInternal function of HashAggregate operator is performed in a single threaded manner.
For aggregations performed on large tables, the finalize becomes a significant bottleneck.

I'm running benchmarks on the MS MARCO dataset for FTS where we do aggregation for creating the index: https://trec-rag.github.io/annoucements/2024-corpus-finalization/

On a small segment partition (#00), the following query:

MATCH (b:ms_marco_test) WITH tokenize(b.segment) AS tk, OFFSET(ID(b)) AS id UNWIND tk AS t
RETURN STEM(t, 'porter'), id, count(*);

takes 134322.01ms to run and just the finalize part takes 84198 ms.

The text was updated successfully, but these errors were encountered:

acquamarin · 2024-11-25T15:23:30Z

Duckdb implements the parallel aggregation algorithm described in the section4.4 of this paper: https://15721.courses.cs.cmu.edu/spring2016/papers/p743-leis.pdf.
We can consider following the idea.

anuchak added the performance optimization label Nov 19, 2024

anuchak assigned benjaminwinger Nov 19, 2024

semihsalihoglu-uw added the high-priority label Nov 19, 2024

benjaminwinger mentioned this issue Dec 18, 2024

Hash aggregate finalization parallelization #4655

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: Parallelize finalize state of aggregation #4547

Optimization: Parallelize finalize state of aggregation #4547

anuchak commented Nov 19, 2024 •

edited

Loading

acquamarin commented Nov 25, 2024 •

edited by prrao87

Loading

Optimization: Parallelize finalize state of aggregation #4547

Optimization: Parallelize finalize state of aggregation #4547

Comments

anuchak commented Nov 19, 2024 • edited Loading

Description

acquamarin commented Nov 25, 2024 • edited by prrao87 Loading

anuchak commented Nov 19, 2024 •

edited

Loading

acquamarin commented Nov 25, 2024 •

edited by prrao87

Loading