-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? # to your account
Implemented MultiIndex.equal_levels #1789
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1789 +/- ##
==========================================
- Coverage 94.64% 94.61% -0.04%
==========================================
Files 49 49
Lines 10818 10724 -94
==========================================
- Hits 10239 10146 -93
+ Misses 579 578 -1
Continue to review full report at Codecov.
|
Could someone double check this just once more? Seems fine to me. |
I'll take a look later. |
Sure, thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see some issues:
>>> pmidx1 = pd.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> pmidx2 = pd.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z")])
>>> pmidx1.equal_levels(pmidx2)
True
>>> kmidx1 = ks.from_pandas(pmidx1)
>>> kmidx2 = ks.from_pandas(pmidx2)
>>> kmidx1.equal_levels(kmidx2)
False
or
>>> pmidx1 = pd.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")])
>>> pmidx2 = pd.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")])
>>> pmidx1.equal_levels(pmidx2)
True
>>> kmidx1 = ks.from_pandas(pmidx1)
>>> kmidx2 = ks.from_pandas(pmidx2)
>>> kmidx1.equal_levels(kmidx2)
False
return False | ||
self_frame = self.sort_values().to_frame() | ||
other_frame = other.sort_values().to_frame() | ||
with option_context("compute.ops_on_diff_frames", True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might avoid force enabling compute.ops_on_diff_frames
. let's see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for the review! Let me resolve the comments
### What changes were proposed in this pull request? This PR proposes implementing `MultiIndex.equal_levels`. ```python >>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")]) >>> psmidx2 = ps.MultiIndex.from_tuples([("b", "y"), ("a", "x"), ("c", "z")]) >>> psmidx1.equal_levels(psmidx2) True >>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")]) >>> psmidx2 = ps.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")]) >>> psmidx1.equal_levels(psmidx2) True ``` This was originally proposed in databricks/koalas#1789, and all reviews in origin PR has been resolved. ### Why are the changes needed? We should support the pandas API as much as possible for pandas-on-Spark module. ### Does this PR introduce _any_ user-facing change? Yes, the `MultiIndex.equal_levels` API is available. ### How was this patch tested? Unittests Closes #34113 from itholic/SPARK-36435. Lead-authored-by: itholic <haejoon.lee@databricks.com> Co-authored-by: Haejoon Lee <44108233+itholic@users.noreply.github.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
This PR proposes
MultiIndex.equal_levels
.