Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

REF: Make concat not stateful. #59141

Merged
merged 5 commits into from
Jul 5, 2024
Merged

Conversation

mroeschke
Copy link
Member

All the logic used to be held in a _Concatenator class. The statefullness wasn't really needed since concat is a one-shot method so broke it out into its functions

@mroeschke mroeschke added Refactor Internal refactoring of code Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Jun 28, 2024
@mroeschke mroeschke added this to the 3.0 milestone Jun 28, 2024
@mroeschke
Copy link
Member Author

| Change   | Before [039edee0] <main>   | After [0becd77e] <ref/concat>   |   Ratio | Benchmark (Parameter)                                                                |
|----------|----------------------------|---------------------------------|---------|--------------------------------------------------------------------------------------|
| -        | 46.8±4μs                   | 41.8±0.1μs                      |    0.89 | join_merge.ConcatIndexDtype.time_concat_series('string[pyarrow]', 'has_na', 0, True) |

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

@mroeschke
Copy link
Member Author

Since test are passing, going to merge. Happy to follow up if needed

@mroeschke mroeschke merged commit f3ad4d5 into pandas-dev:main Jul 5, 2024
45 checks passed
@mroeschke mroeschke deleted the ref/concat branch July 5, 2024 17:54
@martinfleis
Copy link
Contributor

Hey, just a note that this has broken concat override in geopandas, where we're now hitting 'object' object has no attribute 'objs'. Just wanted to clarify if that is expected or just an unwanted consequence of refactoring (i.e. whether I should fix it on geopandas side).

@mroeschke
Copy link
Member Author

It was an intended consequence of the refactor given that concat isn't stateful anymore. I think I can come up with a fix that will make this non-breaking for geopandas

To confirm it appears that geopandas overrides __finalize__? Just curious the reason for that

@martinfleis
Copy link
Contributor

To confirm it appears that geopandas overrides finalize? Just curious the reason for that

To ensure that we can correctly propagate custom metadata. That used to be an information about projection and an active geometry column, right now it is only active geometry column. Without that, the resulting GeoDataFrame would lose the track of active geometry and require manual re-assignment of active geometry.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Refactor Internal refactoring of code Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants