-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Write a blog post fast Vectorized grouping for high cardinality #6988
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Comments
I have drafted a blog about this with @tustvold and @Dandandan -- it will be published on the InfluxData blog first and then I will propose reposting it on the arrow blog site. I expect to have a draft up later this week |
here is a blog we wrote about how to do high cardinality grouping really fast: https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion/ I will propose a PR to cross-post the content to the arrow blog as well in the coming days |
PR on arrow-site ready: apache/arrow-site#386 |
…ion 28.0.0 (#386) Closes apache/datafusion#6988 **Note**: This describes work @tustvold @Dandandan and I did in DataFusion 28.0.0. This content was originally published on the [InfluxData Blog](https://www.influxdata.com/blog/aggregating-millions-groups-fast-apache-arrow-datafusion/) but since it is general applicable to Apache Arrow DataFusion I would like to syndicate it here becase: 1. This is a form where the community can comment / keep it up to date via PR 2. It is hosted on a platform with a different lifetime than a company blog This is the same model we followed with https://arrow.apache.org/blog/2022/12/26/querying-parquet-with-millisecond-latency/ which was also republished on the arrow blog after the InfluxData blog It also gives me an example to use my original ASCII art diagrams :)
It is now re-published on https://arrow.apache.org/blog/2023/08/05/datafusion_fast_grouping/ ✅ |
Uh oh!
There was an error while loading. Please reload this page.
The idea here is to write a blog post explaining / motivating the improvement in DataFusion grouping made in #6904
The text was updated successfully, but these errors were encountered: