Maintain row order after cross join #463
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes the failing polars tests.
By default,
polars
gives no guarantees on the resulting row order of a join (see here), meaning that our tests used to pass just by luck. This has changed sincepolars==0.19.0
, which apparently included changes that affect the row order of our test dataframes. The PR fixes these tests by ignoring the row order during the equality check.The current version of the polars cross join computing the Cartesian product currently makes no guarantees on the resulting row order (see here). While strictly not a bug, this makes the behavior inconsistent with the corresponding pandas implementation and makes comparison more difficult. Accordingly, our tests started to fail since polars0.19.0
which apparently included changes that affect the row order of our test dataframes.Enforcing the pandas-equivalent row order seems like the best option for now, while potentially not exploiting the maximum possible speed of the polars join. In case speed really becomes a limitation, we can remove the restriction but then need to check that we don't rely on the order anywhere.