Skip to content

Reorganize the GroupColumn implementations into more coherent modules #13262

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
Tracked by #12680
alamb opened this issue Nov 5, 2024 · 6 comments · Fixed by #13352
Closed
Tracked by #12680

Reorganize the GroupColumn implementations into more coherent modules #13262

alamb opened this issue Nov 5, 2024 · 6 comments · Fixed by #13352
Assignees
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Nov 5, 2024

Is your feature request related to a problem or challenge?

While reviewing #12996 I noticed that all the implementations of GroupColumn for different types were in datafusion/physical-plan/src/aggregates/group_values/group_column.rs which is:

  1. a single large file
  2. Not obvious that it is like an implementation detail of GroupValuesColumn: https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/column.rs#L94-L93

Describe the solution you'd like

I would like to propose rearrainging the code:

  1. Rename https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/column.rs to https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/multi_column/mod.rs
  2. Move the implementations in datafusion/physical-plan/src/aggregates/group_values/group_column.rs to different modules within https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/aggregates/group_values/multi_column (like primitive.rs, bytes.rs, etc)

Describe alternatives you've considered

No response

Additional context

No response

@alamb alamb added the enhancement New feature or request label Nov 5, 2024
@alamb alamb added the good first issue Good for newcomers label Nov 5, 2024
@alamb
Copy link
Contributor Author

alamb commented Nov 5, 2024

Once #12996 is merged, this would be a good first issue I think as it is just code movement and somewhat mechanical It would be a good way to get introduced to the code

@Chen-Yuan-Lai
Copy link
Contributor

take

@alamb
Copy link
Contributor Author

alamb commented Nov 6, 2024

#12996 has now merged so this issue should now be unblocked

@jayzhan211
Copy link
Contributor

@Chen-Yuan-Lai Have you started on this?

@Chen-Yuan-Lai
Copy link
Contributor

Chen-Yuan-Lai commented Nov 11, 2024

Yes, I'm working on this, but have some questions:

  1. I trying to rename /column.rs to /multi_column/mod.rs, and move /group_column.rs to different module. simultaneously, I've also moved some shared trait and function to /multi_column/mod.rs. However, /multi_column/mod.rs itself also defines and implements some data structures (e.g.,GroupValuesColumn, GroupIndexView). It feels too complex as a mod.rs file. Should I split it into other files as well?

  2. In the original /group_column.rs, there were some test cases. After splitting /group_column.rs into three modules (/multi_column/byte.rs, /multi_column/byte_view.rs, and /multi_column/primitive.rs), I move the corresponding tests to their respective files. I'm not sure if this is reasonable, or if it would be better to keep all tests together in multi_column/test.rs?"

@jayzhan211
Copy link
Contributor

Or you can name it as multi_column.rs and add the submodule in this this file. Structure is similar to datafusion/expr-common/src/type_coercion.rs and you can place trait in multi_column.rs.

I think it is fine to place test in separated file.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
3 participants