Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Maintain row order after cross join #463

Merged
merged 4 commits into from
Jan 24, 2025
Merged

Maintain row order after cross join #463

merged 4 commits into from
Jan 24, 2025

Conversation

AdrianSosic
Copy link
Collaborator

@AdrianSosic AdrianSosic commented Jan 16, 2025

Fixes the failing polars tests.

By default, polars gives no guarantees on the resulting row order of a join (see here), meaning that our tests used to pass just by luck. This has changed since polars==0.19.0, which apparently included changes that affect the row order of our test dataframes. The PR fixes these tests by ignoring the row order during the equality check.

The current version of the polars cross join computing the Cartesian product currently makes no guarantees on the resulting row order (see here). While strictly not a bug, this makes the behavior inconsistent with the corresponding pandas implementation and makes comparison more difficult. Accordingly, our tests started to fail since polars 0.19.0 which apparently included changes that affect the row order of our test dataframes.

Enforcing the pandas-equivalent row order seems like the best option for now, while potentially not exploiting the maximum possible speed of the polars join. In case speed really becomes a limitation, we can remove the restriction but then need to check that we don't rely on the order anywhere.

@AdrianSosic AdrianSosic added enhancement Expand / change existing functionality on hold PR progress is awaiting for something else to continue labels Jan 16, 2025
@AdrianSosic AdrianSosic self-assigned this Jan 16, 2025
Copy link
Collaborator

@Scienfitz Scienfitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comment about performance
if performance is impacted by the fix I prefer not to include it and soften tests instead

@AdrianSosic AdrianSosic requested a review from Scienfitz January 24, 2025 08:02
@AdrianSosic AdrianSosic removed the on hold PR progress is awaiting for something else to continue label Jan 24, 2025
@AdrianSosic AdrianSosic merged commit e2678ec into main Jan 24, 2025
9 of 11 checks passed
@AdrianSosic AdrianSosic deleted the fix/polars_order branch January 24, 2025 09:02
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
enhancement Expand / change existing functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants