Skip to content

Blog post with DataFusion Jun - Sep 2023 #6780

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Tracked by #8655
alamb opened this issue Jun 27, 2023 · 12 comments · Fixed by apache/arrow-site#457
Closed
Tracked by #8655

Blog post with DataFusion Jun - Sep 2023 #6780

alamb opened this issue Jun 27, 2023 · 12 comments · Fixed by apache/arrow-site#457
Assignees
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Jun 27, 2023

Is your feature request related to a problem or challenge?

We have had good luck writing up quarterly updates for DataFusion, most recently:
https://arrow.apache.org/blog/2023/06/24/datafusion-25.0.0/

(see #5812)

Describe the solution you'd like

It would be great to write another about what has happened in the last few months of DataFusion

Things I expect will be good to highlight (🤞 ):

  • Improved Struct/array support (@izveigor ❤️ )
  • better group by performance with many distinct groups
  • better insert performance

Others?

Describe alternatives you've considered

No response

Additional context

No response

@alamb
Copy link
Contributor Author

alamb commented Jul 17, 2023

Ideas of Major items to include in this post

  1. User defined window functions: Blog post about user defined window functions #6781
  2. faster aggregatge performance -- Improve the performance of Aggregator, grouping, aggregation #4973
  3. Support for ARRAY / Lists -- General ticket for Array/List data type #6863 etc (thanks @izveigor and @jayzhan211 )

@Dandandan
Copy link
Contributor

Improved join performance would maybe be another thing to highlight. Maybe we show a benchmark with improvements (TCP-H, ClickBench, ...) from version 25 -> 28.

@alamb
Copy link
Contributor Author

alamb commented Aug 17, 2023

There has been major work on INSERT and COPY as well, thanks to @devinjdangelo : #6569

@alamb
Copy link
Contributor Author

alamb commented Sep 15, 2023

Also #7400 spilling group by from @kazuyukitanimura

@alamb
Copy link
Contributor Author

alamb commented Sep 18, 2023

Another topic: the new library user guide: https://arrow.apache.org/datafusion/library-user-guide/index.html

@alamb
Copy link
Contributor Author

alamb commented Oct 14, 2023

FYI this is very much on my list, but I need to focus on the SIGMOD paper for a while. If someone else has the time and inclination to start a PR I would be most appreciative

@alamb
Copy link
Contributor Author

alamb commented Nov 6, 2023

Realistically I am very tied up with #6782 and so won't have time to work on a blog post until after that is submitted (end of Nov). If someone else has time to work on this it would be very much apprecaited

@alamb
Copy link
Contributor Author

alamb commented Jan 1, 2024

This is going to have to be more like a 2023 retrospective 🤔

@alamb
Copy link
Contributor Author

alamb commented Jan 4, 2024

I am starting to draft this now

@alamb
Copy link
Contributor Author

alamb commented Jan 7, 2024

Here is a PR with a draft (still needs more work): apache/arrow-site#457

alamb added a commit to apache/arrow-site that referenced this issue Jan 19, 2024
Closes apache/datafusion#6780

This blog post describes DataFusion over the last 6 months, DataFusion
26 to 34.

If anyone has time to pitch in and look up links or help with the
language that would be most apprecaited

---------

Co-authored-by: Bruce Ritchie <bruce.ritchie@veeva.com>
Co-authored-by: Andy Grove <andygrove73@gmail.com>
Co-authored-by: Mustafa Akur <106137913+mustafasrepo@users.noreply.github.com>
@alamb
Copy link
Contributor Author

alamb commented Jan 19, 2024

The blog post is now published! https://arrow.apache.org/blog/2024/01/19/datafusion-34.0.0/

@alamb
Copy link
Contributor Author

alamb commented Mar 13, 2024

Let's capture other items to highlight here #9602

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants