Skip to content

BUG: multi-index joining returns wrong multiindex #16182

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
OXPHOS opened this issue May 1, 2017 · 1 comment
Closed

BUG: multi-index joining returns wrong multiindex #16182

OXPHOS opened this issue May 1, 2017 · 1 comment

Comments

@OXPHOS
Copy link
Contributor

OXPHOS commented May 1, 2017

Code Sample, a copy-pastable example if possible

modified from TestMergeMulti.test_join_multi_levels

import pandas as pd

household = (
    pd.DataFrame(
        dict(A=[1, 2, 3],
             B=[0, 1, 0],
             C=[19.3, 31.7, 29]),
        columns=['A', 'B', 'C'])
        .set_index('A'))
portfolio = (
    pd.DataFrame(
        dict(A=[1, 2, 2, 3, 3, 3, 4],
             d=["nl0", "nl3", "gb0",
                "gb0", "lu4", "nl5", 'EMPTY'],
             e=["ABN", "Robeco", "Royal", "Royal",
                "AAB", "Postbank", 'EMPTY'],
             f=[1.0, 0.4, 0.6, 0.15, 0.6, 0.25, 1.0]),
        columns=['A', 'd', 'e', 'f'])
        .set_index(['A', 'd']))
result = household.join(portfolio, how='inner')

print household 
     B     C
  A         
  1  0  19.3
  2  1  31.7
  3  0  29.0

print portfolio
                  e     f
  A d                    
  1 nl0         ABN  1.00
  2 nl3      Robeco  0.40
    gb0       Royal  0.60
  3 gb0       Royal  0.15
    lu4         AAB  0.60
    nl5    Postbank  0.25
  4 EMPTY     EMPTY  1.00

print result
         B     C         e     f
  A d                           
  1 nl0  0  19.3       ABN  1.00
  2 nl3  1  31.7    Robeco  0.40
    gb0  1  31.7     Royal  0.60
  3 gb0  0  29.0     Royal  0.15
    lu4  0  29.0       AAB  0.60
    nl5  0  29.0  Postbank  0.25

print result.columns
  MultiIndex(levels=[[1, 2, 3], [u'EMPTY', u'gb0', u'lu4', u'nl0', u'nl3', u'nl5']],
             labels=[[0, 1, 1, 2, 2, 2], [3, 4, 1, 1, 2, 5]],
             names=[u'A', u'd'])

Problem description

The result looks okay but I think the 'EMPTY' should be dropped from the MultiIndex.

Expected Output

  MultiIndex(levels=[[1, 2, 3], [ u'gb0', u'lu4', u'nl0', u'nl3', u'nl5']],
             labels=[[0, 1, 1, 2, 2, 2], [2, 3, 0, 0, 1, 4]],
             names=[u'A', u'd'])

Output of pd.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 2.7.13.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

See #2770. This is a detail of how multiindexes (currently) work.

You can use http://pandas-docs.github.io/pandas-docs-travis/generated/pandas.MultiIndex.remove_unused_levels.html?highlight=remove_unused#pandas.MultiIndex.remove_unused_levels to remove the unused levels afterwards.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

2 participants