Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Library Guide: Extending DataFusion's operators: custom LogicalPlan and ExecutionPlans #7308

Open
alamb opened this issue Aug 16, 2023 · 4 comments
Labels
documentation Improvements or additions to documentation enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Aug 16, 2023

Is your feature request related to a problem or challenge?

Part of #7014

If we want to have DataFusion used as the core of many new systems, we need it to be as easy as possible for someone to get their idea working on top of DataFusion.

Thanks to @tshauck we now have a basic Library Users Guide ❤️ and this ticket describes expanding it out

Describe the solution you'd like

Fill in the content of https://arrow.apache.org/datafusion/library-user-guide/extending-operators.html

We can draw inspiration from https://github.com/apache/arrow-datafusion/blob/main/datafusion/core/tests/user_defined/user_defined_plan.rs

Example Outline

  1. Introduce an example plan node that can not be expressed with existing relational operators (maybe pivot rows to columns, like here)
  2. Show how to define the Logical extension user defined node
  3. SHow how to use an extension planner physical planner to plan such a node (example here)
  4. Show how to create a simplified execution plan / stream

The examples directory holds a bunch more of examples: https://github.com/apache/arrow-datafusion/tree/main/datafusion-examples

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added documentation Improvements or additions to documentation enhancement New feature or request devrel labels Aug 16, 2023
@brayanjuls
Copy link
Contributor

I was investigating about pivoting in the DataFrame API and found some of the links in this issues are broken, leaving the replacement here for someone trying to work on this in the future

  1. pivot rows to columns, link
  2. how to use extension physical planner, link

@alamb
Copy link
Contributor Author

alamb commented Aug 19, 2024

Thanks @brayanjuls

@alamb alamb removed the devrel label Oct 21, 2024
@Tangeroooo
Copy link

Can I take this issue?

@alamb
Copy link
Contributor Author

alamb commented Feb 19, 2025

Can I take this issue?

Yes of course, please do! Please ping me on the PR if you need a reviewer as improving the DataFusion documentation is high on my list of things to do

@Tangeroooo now that we are running the examples in the user guide, we can probably move the entire example into the docs (as they will still be tested)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants