Skip to content

BUG: ValueError with Series.isin and tuples #16394

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
wmp3 opened this issue May 20, 2017 · 3 comments · Fixed by #16434
Closed

BUG: ValueError with Series.isin and tuples #16394

wmp3 opened this issue May 20, 2017 · 3 comments · Fixed by #16434
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Milestone

Comments

@wmp3
Copy link

wmp3 commented May 20, 2017

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
df['C'] = list(zip(df['A'], df['B']))
df['C'].isin([(1, 'a')])

Problem description

Returns ValueError:
Traceback (most recent call last):
File "", line 1, in
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/series.py", line 2555, in isin
result = algorithms.isin(_values_from_object(self), values)
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/algorithms.py", line 421, in isin
return f(comps, values)
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/algorithms.py", line 399, in
f = lambda x, y: htable.ismember_object(x, values)
File "pandas/_libs/hashtable_func_helper.pxi", line 428, in pandas._libs.hashtable.ismember_object (pandas/_libs/hashtable.c:29677)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Expected Output

In pandas 0.19.2 returns:
0 True
1 False
2 False
Name: C, dtype: bool

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.0rc2
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented May 20, 2017

this code was refactored to be more general, so this was a missing case. easy fix I think. np.array converts nested tuples to lists, which is not nice, so do this.

if you'd like to submit a PR with this as an added tests (and make sure nothing else breaks), would be great.

diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py
index a745ec6..77d79c9 100644
--- a/pandas/core/algorithms.py
+++ b/pandas/core/algorithms.py
@@ -388,7 +388,7 @@ def isin(comps, values):
                         "[{0}]".format(type(values).__name__))
 
     if not isinstance(values, (ABCIndex, ABCSeries, np.ndarray)):
-        values = np.array(list(values), dtype='object')
+        values = lib.list_to_object_array(list(values))
 
     comps, dtype, _ = _ensure_data(comps)
     values, _, _ = _ensure_data(values, dtype=dtype)

@jreback jreback added Bug Difficulty Novice Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 20, 2017
@jreback jreback added this to the Next Major Release milestone May 20, 2017
@jreback jreback changed the title ValueError with Series.isin and tuples BUG: ValueError with Series.isin and tuples May 20, 2017
@jorisvandenbossche jorisvandenbossche modified the milestones: 0.20.2, Next Major Release May 20, 2017
@jaredsnyder
Copy link
Contributor

I'm taking a crack at this. Is the solution to just add lib.list_to_object_array back in along with a test for the tuple case, or should we check if comps contains tuples and use lib.list_to_object_array only if it does?

@jorisvandenbossche
Copy link
Member

@jaredsnyder I think you can try the exact change that @jreback showed above, when it are not tuples, both approaches should normally do the same, so I don't think it is needed to check if it contains tuples or not. And for sure adding a test!

jorisvandenbossche pushed a commit that referenced this issue May 23, 2017
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples
TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this issue May 29, 2017
…-dev#16434)

* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples

(cherry picked from commit e053ee3)
TomAugspurger pushed a commit that referenced this issue May 30, 2017
* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples

(cherry picked from commit e053ee3)
stangirala pushed a commit to stangirala/pandas that referenced this issue Jun 11, 2017
…-dev#16434)

* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py
Added test for comparing to a list of tuples
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants