BUG: ValueError with Series.isin and tuples #16394

wmp3 · 2017-05-20T00:18:17Z

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'f']})
df['C'] = list(zip(df['A'], df['B']))
df['C'].isin([(1, 'a')])

Problem description

Returns ValueError:
Traceback (most recent call last):
File "", line 1, in
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/series.py", line 2555, in isin
result = algorithms.isin(_values_from_object(self), values)
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/algorithms.py", line 421, in isin
return f(comps, values)
File "/anaconda/envs/pandas_dev/lib/python3.6/site-packages/pandas/core/algorithms.py", line 399, in
f = lambda x, y: htable.ismember_object(x, values)
File "pandas/_libs/hashtable_func_helper.pxi", line 428, in pandas._libs.hashtable.ismember_object (pandas/_libs/hashtable.c:29677)
ValueError: Buffer has wrong number of dimensions (expected 1, got 2)

Expected Output

In pandas 0.19.2 returns:
0 True
1 False
2 False
Name: C, dtype: bool

Output of `pd.show_versions()`

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.1.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.20.0rc2
pytest: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.1
scipy: None
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2017.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

jreback · 2017-05-20T14:58:01Z

this code was refactored to be more general, so this was a missing case. easy fix I think. np.array converts nested tuples to lists, which is not nice, so do this.

if you'd like to submit a PR with this as an added tests (and make sure nothing else breaks), would be great.

diff --git a/pandas/core/algorithms.py b/pandas/core/algorithms.py
index a745ec6..77d79c9 100644
--- a/pandas/core/algorithms.py
+++ b/pandas/core/algorithms.py
@@ -388,7 +388,7 @@ def isin(comps, values):
                         "[{0}]".format(type(values).__name__))
 
     if not isinstance(values, (ABCIndex, ABCSeries, np.ndarray)):
-        values = np.array(list(values), dtype='object')
+        values = lib.list_to_object_array(list(values))
 
     comps, dtype, _ = _ensure_data(comps)
     values, _, _ = _ensure_data(values, dtype=dtype)

jaredsnyder · 2017-05-22T19:10:06Z

I'm taking a crack at this. Is the solution to just add lib.list_to_object_array back in along with a test for the tuple case, or should we check if comps contains tuples and use lib.list_to_object_array only if it does?

jorisvandenbossche · 2017-05-22T19:45:11Z

@jaredsnyder I think you can try the exact change that @jreback showed above, when it are not tuples, both approaches should normally do the same, so I don't think it is needed to check if it contains tuples or not. And for sure adding a test!

* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py Added test for comparing to a list of tuples

…-dev#16434) * Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py Added test for comparing to a list of tuples (cherry picked from commit e053ee3)

* Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py Added test for comparing to a list of tuples (cherry picked from commit e053ee3)

…-dev#16434) * Swiched out "values = np.array(list(values), dtype='object')" for "values = lib.list_to_object_array(list(values))" in the isin() method found in core/algorithms.py Added test for comparing to a list of tuples

jreback added Bug Difficulty Novice Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels May 20, 2017

jreback added this to the Next Major Release milestone May 20, 2017

jreback changed the title ~~ValueError with Series.isin and tuples~~ BUG: ValueError with Series.isin and tuples May 20, 2017

jorisvandenbossche modified the milestones: 0.20.2, Next Major Release May 20, 2017

jaredsnyder mentioned this issue May 22, 2017

BUG: fix isin with Series of tuples values (#16394) #16434

Merged

4 tasks

jorisvandenbossche closed this as completed in #16434 May 23, 2017

jreback mentioned this issue May 30, 2017

Regression from 0.19.2 to 0.20.1 in pandas.unique() when applied to list of tuples #16519

Closed

wikiped mentioned this issue Jul 17, 2017

ValueError on df.columns.isin(pd.Series()) #16991

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: ValueError with Series.isin and tuples #16394

BUG: ValueError with Series.isin and tuples #16394

wmp3 commented May 20, 2017

jreback commented May 20, 2017

Uh oh!

jaredsnyder commented May 22, 2017

Uh oh!

jorisvandenbossche commented May 22, 2017

Uh oh!

Uh oh!

BUG: ValueError with Series.isin and tuples #16394

BUG: ValueError with Series.isin and tuples #16394

Comments

wmp3 commented May 20, 2017

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

jreback commented May 20, 2017

Uh oh!

jaredsnyder commented May 22, 2017

Uh oh!

jorisvandenbossche commented May 22, 2017

Uh oh!

Output of `pd.show_versions()`