Skip to content

KeyError for crosstab on Series with same name - Still an issue in v0.23.1 #21765

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
kasuteru opened this issue Jul 6, 2018 · 1 comment
Closed

Comments

@kasuteru
Copy link
Contributor

kasuteru commented Jul 6, 2018

Code Sample

s1 = pd.Series([1,1,2,2,3,3], name='s')
s2 = pd.Series([1,1,1,2,2,2], name='s')

pd.crosstab(s1, s2)

Return:

Short version:

ValueError: Duplicated level name: "s", assigned to level 1, is already used for level 0.

Long version:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-66-b5484aa990df> in <module>()
      2 s2 = pd.Series([1,1,1,2,2,2], name='s')
      3 
----> 4 pd.crosstab(s1, s2)

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in crosstab(index, columns, values, rownames, colnames, aggfunc, margins, margins_name, dropna, normalize)
    490     table = df.pivot_table('__dummy__', index=rownames, columns=colnames,
    491                            margins=margins, margins_name=margins_name,
--> 492                            dropna=dropna, **kwargs)
    493 
    494     # Post-process

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\frame.py in pivot_table(self, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
   5301                            aggfunc=aggfunc, fill_value=fill_value,
   5302                            margins=margins, dropna=dropna,
-> 5303                            margins_name=margins_name)
   5304 
   5305     def stack(self, level=-1, dropna=True):

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\reshape\pivot.py in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna, margins_name)
     85     # if we have a categorical
     86     grouped = data.groupby(keys, observed=False)
---> 87     agged = grouped.agg(aggfunc)
     88     if dropna and isinstance(agged, ABCDataFrame) and len(agged.columns):
     89         agged = agged.dropna(how='all')

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in aggregate(self, arg, *args, **kwargs)
   4656         axis=''))
   4657     def aggregate(self, arg, *args, **kwargs):
-> 4658         return super(DataFrameGroupBy, self).aggregate(arg, *args, **kwargs)
   4659 
   4660     agg = aggregate

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in aggregate(self, arg, *args, **kwargs)
   4095             # grouper specific aggregations
   4096             if self.grouper.nkeys > 1:
-> 4097                 return self._python_agg_general(arg, *args, **kwargs)
   4098             else:
   4099 

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _python_agg_general(self, func, *args, **kwargs)
   1086                 output[name] = self._try_cast(values[mask], result)
   1087 
-> 1088         return self._wrap_aggregated_output(output)
   1089 
   1090     def _wrap_applied_output(self, *args, **kwargs):

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in _wrap_aggregated_output(self, output, names)
   4728             result = result._consolidate()
   4729         else:
-> 4730             index = self.grouper.result_index
   4731             result = DataFrame(output, index=index, columns=output_keys)
   4732 

pandas/_libs/properties.pyx in pandas._libs.properties.CachedProperty.__get__()

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\groupby\groupby.py in result_index(self)
   2379                             labels=labels,
   2380                             verify_integrity=False,
-> 2381                             names=self.names)
   2382         return result
   2383 

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\indexes\multi.py in __new__(cls, levels, labels, sortorder, names, dtype, copy, name, verify_integrity, _set_identity)
    230         if names is not None:
    231             # handles name validation
--> 232             result._set_names(names)
    233 
    234         if sortorder is not None:

C:\A_PROGRAMS\anaconda3\lib\site-packages\pandas\core\indexes\multi.py in _set_names(self, names, level, validate)
    693                         'Duplicated level name: "{}", assigned to '
    694                         'level {}, is already used for level '
--> 695                         '{}.'.format(name, l, used[name]))
    696 
    697             self.levels[l].rename(name, inplace=True)

ValueError: Duplicated level name: "s", assigned to level 1, is already used for level 0.

Problem description

#6319 supposedly fixed this issue, yet it still persists in my configuration.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.1
pytest: 3.5.1
pip: 10.0.1
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: 0.9.0
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.5
feather: 0.4.0
matplotlib: 2.1.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.4
lxml: 4.1.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@jschendel
Copy link
Member

I think this was reintroduced when we disallowed duplicate index level names, and has since been re-allowed by #21423.

This fix is included in 0.23.2, which was just released:

In [2]: pd.__version__
Out[2]: '0.23.2'

In [3]: s1 = pd.Series([1,1,2,2,3,3], name='s')

In [4]: s2 = pd.Series([1,1,1,2,2,2], name='s')

In [5]: pd.crosstab(s1, s2)
Out[5]: 
s  1  2
s      
1  3  0
2  0  3

@jschendel jschendel added this to the No action milestone Jul 6, 2018
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants