Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Optimization: Parallelize finalize state of aggregation #4547

Open
anuchak opened this issue Nov 19, 2024 · 1 comment
Open

Optimization: Parallelize finalize state of aggregation #4547

anuchak opened this issue Nov 19, 2024 · 1 comment

Comments

@anuchak
Copy link
Collaborator

anuchak commented Nov 19, 2024

Description

Currently, the finalizeInternal function of HashAggregate operator is performed in a single threaded manner.
For aggregations performed on large tables, the finalize becomes a significant bottleneck.

I'm running benchmarks on the MS MARCO dataset for FTS where we do aggregation for creating the index: https://trec-rag.github.io/annoucements/2024-corpus-finalization/

On a small segment partition (#00), the following query:

MATCH (b:ms_marco_test) WITH tokenize(b.segment) AS tk, OFFSET(ID(b)) AS id UNWIND tk AS t
RETURN STEM(t, 'porter'), id, count(*);

takes 134322.01ms to run and just the finalize part takes 84198 ms.

@acquamarin
Copy link
Collaborator

acquamarin commented Nov 25, 2024

Duckdb implements the parallel aggregation algorithm described in the section4.4 of this paper: https://15721.courses.cs.cmu.edu/spring2016/papers/p743-leis.pdf.
We can consider following the idea.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

4 participants