Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Implemented MultiIndex.equal_levels #1789

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

itholic
Copy link
Contributor

@itholic itholic commented Sep 23, 2020

This PR proposes MultiIndex.equal_levels.

>>> kmidx1 = ks.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> kmidx2 = ks.MultiIndex.from_tuples([("b", "y"), ("a", "x"), ("c", "z")])
>>> kmidx1.equal_levels(kmidx2)
True

@codecov-io
Copy link

codecov-io commented Nov 11, 2020

Codecov Report

Merging #1789 (d679dc2) into master (138c7b8) will decrease coverage by 0.03%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1789      +/-   ##
==========================================
- Coverage   94.64%   94.61%   -0.04%     
==========================================
  Files          49       49              
  Lines       10818    10724      -94     
==========================================
- Hits        10239    10146      -93     
+ Misses        579      578       -1     
Impacted Files Coverage Δ
databricks/koalas/missing/indexes.py 100.00% <ø> (ø)
databricks/koalas/indexes.py 96.98% <100.00%> (+0.02%) ⬆️
databricks/koalas/__init__.py 85.93% <0.00%> (-4.69%) ⬇️
databricks/koalas/testing/utils.py 79.12% <0.00%> (-1.40%) ⬇️
...ricks/koalas/tests/plot/test_series_plot_plotly.py 95.83% <0.00%> (-0.47%) ⬇️
databricks/koalas/generic.py 92.79% <0.00%> (-0.13%) ⬇️
databricks/koalas/plot/matplotlib.py 93.23% <0.00%> (-0.12%) ⬇️
databricks/koalas/mlflow.py 95.12% <0.00%> (-0.12%) ⬇️
databricks/koalas/indexing.py 92.53% <0.00%> (-0.10%) ⬇️
databricks/koalas/plot/core.py 92.68% <0.00%> (-0.09%) ⬇️
... and 16 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 138c7b8...d679dc2. Read the comment docs.

@itholic
Copy link
Contributor Author

itholic commented Dec 3, 2020

Could someone double check this just once more? Seems fine to me.

@ueshin
Copy link
Collaborator

ueshin commented Dec 3, 2020

I'll take a look later.
@itholic Could you push an empty or a merge commit to rerun tests as it's been a while since the last build.

@ueshin ueshin self-requested a review December 3, 2020 01:42
@itholic
Copy link
Contributor Author

itholic commented Dec 3, 2020

Sure, thanks :)

Copy link
Collaborator

@ueshin ueshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see some issues:

>>> pmidx1 = pd.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> pmidx2 = pd.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z")])
>>> pmidx1.equal_levels(pmidx2)
True

>>> kmidx1 = ks.from_pandas(pmidx1)
>>> kmidx2 = ks.from_pandas(pmidx2)
>>> kmidx1.equal_levels(kmidx2)
False

or

>>> pmidx1 = pd.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")])
>>> pmidx2 = pd.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")])
>>> pmidx1.equal_levels(pmidx2)
True

>>> kmidx1 = ks.from_pandas(pmidx1)
>>> kmidx2 = ks.from_pandas(pmidx2)
>>> kmidx1.equal_levels(kmidx2)
False

return False
self_frame = self.sort_values().to_frame()
other_frame = other.sort_values().to_frame()
with option_context("compute.ops_on_diff_frames", True):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might avoid force enabling compute.ops_on_diff_frames. let's see.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for the review! Let me resolve the comments

@xinrong-meng
Copy link
Contributor

HyukjinKwon pushed a commit to apache/spark that referenced this pull request Oct 1, 2021
### What changes were proposed in this pull request?
This PR proposes implementing `MultiIndex.equal_levels`.

```python
>>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z")])
>>> psmidx2 = ps.MultiIndex.from_tuples([("b", "y"), ("a", "x"), ("c", "z")])
>>> psmidx1.equal_levels(psmidx2)
True

>>> psmidx1 = ps.MultiIndex.from_tuples([("a", "x"), ("b", "y"), ("c", "z"), ("a", "y")])
>>> psmidx2 = ps.MultiIndex.from_tuples([("a", "y"), ("b", "x"), ("c", "z"), ("c", "x")])
>>> psmidx1.equal_levels(psmidx2)
True
```

This was originally proposed in databricks/koalas#1789, and all reviews in origin PR has been resolved.

### Why are the changes needed?

We should support the pandas API as much as possible for pandas-on-Spark module.

### Does this PR introduce _any_ user-facing change?

Yes, the `MultiIndex.equal_levels` API is available.

### How was this patch tested?

Unittests

Closes #34113 from itholic/SPARK-36435.

Lead-authored-by: itholic <haejoon.lee@databricks.com>
Co-authored-by: Haejoon Lee <44108233+itholic@users.noreply.github.com>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants