Skip to content

IndexError when indexing numpy array with boolean Series #6168

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
stefsmeets opened this issue Jan 29, 2014 · 7 comments
Closed

IndexError when indexing numpy array with boolean Series #6168

stefsmeets opened this issue Jan 29, 2014 · 7 comments
Labels
API Design Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@stefsmeets
Copy link

Previously (version 0.11) it was possible to generate a boolean Series, and use that to index a numpy array. Version 0.13.0 breaks this behaviour and raises "IndexError: unsupported iterator index"

    import numpy as np
    import pandas as pd    

    rng = np.arange(5)

    rng[rng > 2]                       # works as expected
    >>> array([3, 4])

    b = pd.Series(rng > 2)
    rng[b]                               # doesn't work anymore
    >>> IndexError: unsupported iterator index
@dsm054
Copy link
Contributor

dsm054 commented Jan 29, 2014

Same for me with pd 0.13.0-321-gaf73a6f, np 1.9.0.dev-631655e:

>>> rng[b]
*** Reference count error detected: 
an attempt was made to deallocate 5 (i) ***
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: unsupported iterator index

@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

this is numpy bug exposed because of the changes in Series in 0.13 (its no longer a ndarray sub-class), see here: http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#whatsnew-0130-refactoring

numpy doesn't follow the their own protocol, there is a bug report somewhere

but to be honest if you are using pandas then no need to do this at all

just wrap it in a Series

>>> pd.Series(rng)[b]
3    3
4    4
dtype: int64

@jreback jreback closed this as completed Jan 29, 2014
@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

@dsm054
If I could some how have the c-api return True for this:
PyArray_Check(obj) then this would work

In essence numpy only allows sub-classes of ndarray, not duck-typed where the object actually works correctly (as Series does). maybe because of perf.

I thought their was a bug report, but maybe this is an enhancement request, to treat a duck-typed has-as ndarray similarly to a isa-a

@dsm054
Copy link
Contributor

dsm054 commented Jan 29, 2014

@jreback: yeah, numpy doesn't play well with others. This is a problem in Sage too, where we wrap integer literals typed in at the console with Integer. Unfortunately because of how numpy.isscalar works this breaks array indexing.

@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

@dsm054 I created an issue, see above...I think that if they relaxed the type checking (and provide a more duck-typing model), then it would work; not sure how much work this is though.

I have tried to hack around this to get the type checks to work, but they are in the c-api, so not easy way. did I miss anything?

@jreback
Copy link
Contributor

jreback commented Feb 17, 2014

@Tarlitz @dsm054

good news!

numpy 1.9 will now handle this correctly, you can in fact install numpy master and check out for your self.....

@stefsmeets
Copy link
Author

@jreback Cheers mate, I appreciate your efforts :)

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
API Design Duplicate Report Duplicate issue or pull request Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

No branches or pull requests

3 participants