Skip to content

Pandas series.rank(ascending=False) get rise to SIGSEGV error #13445

New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Closed
HaoXJ opened this issue Jun 15, 2016 · 5 comments
Closed

Pandas series.rank(ascending=False) get rise to SIGSEGV error #13445

HaoXJ opened this issue Jun 15, 2016 · 5 comments
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug
Milestone

Comments

@HaoXJ
Copy link

HaoXJ commented Jun 15, 2016

I read dataframe from hdf5 file and rank it. but It rise to a SIGSEGV.
Maybe it's a numpy or table bug..

Code Sample, a copy-pastable example if possible

import pandas as pd
def test_df_ranks(f):
df = pd.read_hdf(f, key="t")
print (df.shape)
print (type(df))
print (df)
s=df.non_current_asset_to_total_asset
#s.rank() # rank() work properly
s.rank(ascending=False) #rank(ascending=False) crash

Expected Output

I expected work well, but rise to me SIGSEGV error
the stacks as follows.
#7 OBJECT_compare (ip1=0x47a3ef4b2e420, ip2=0x7f5c5413f128, __NPY_UNUSED_TAGGEDap=0x7f5cd0100760) at numpy/core/src/multiarray/arraytypes.c.src:2753
#8 0x00007f5d0142c50e in npy_aquicksort (vv=vv@entry=0x7f5c5413f060, tosort=tosort@entry=0x7f5c5413cc80, num=num@entry=52, varr=varr@entry=0x7f5cd0100760) at numpy/core/src/npysort/quicksort.c.src:480
#9 0x00007f5d0139a78a in _new_argsortlike (op=op@entry=0x7f5cd0100760, axis=0, argsort=argsort@entry=0x7f5d0142c310 <npy_aquicksort>, argpart=argpart@entry=0x0, kth=kth@entry=0x0, nkth=nkth@entry=0)

at numpy/core/src/multiarray/item_selection.c:1035
#10 0x00007f5d0139dd7b in PyArray_ArgSort (op=op@entry=0x7f5cd0100760, axis=0, which=) at numpy/core/src/multiarray/item_selection.c:1309
#11 0x00007f5d013dd012 in array_argsort (self=0x7f5cd0100760, args=, kwds=) at numpy/core/src/multiarray/methods.c:1278
#12 0x00007f5cf4eef28f in __Pyx_PyObject_Call (func=0x7f5cd1a1acc8, arg=0x7f5d0f900048, kw=0x0) at pandas/algos.c:201388
#13 0x00007f5cf504e006 in __pyx_pf_6pandas_5algos_8rank_1d_generic (__pyx_v_in_arr=__pyx_v_in_arr@entry=0x7f5cd0100620, __pyx_v_retry=1, __pyx_v_ties_method=0x7f5cf6999768, __pyx_v_ascending=0x7f5d0f6bd700 <_Py_FalseStruct>,

__pyx_v_na_option=, __pyx_v_pct=0x7f5d0f6bd700 <_Py_FalseStruct>, __pyx_self=) at pandas/algos.c:14942
#14 0x00007f5cf5050481 in __pyx_pw_6pandas_5algos_9rank_1d_generic (__pyx_self=, __pyx_args=, __pyx_kwds=0x7f5cd8659488) at pandas/algos.c:14439
#15 0x00007f5d0f3b9477 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#16 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#17 0x00007f5d0f3b7a12 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#18 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#19 0x00007f5d0f3b7a12 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#20 0x00007f5d0f3b8e40 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#21 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#22 0x00007f5d0f3b7a12 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0
#23 0x00007f5d0f3b9f3e in PyEval_EvalCodeEx () from /lib64/libpython3.4m.so.1.0
#24 0x00007f5d0f32a4b3 in function_call () from /lib64/libpython3.4m.so.1.0
#25 0x00007f5d0f301dcc in PyObject_Call () from /lib64/libpython3.4m.so.1.0
#26 0x00007f5d0f3b57c9 in PyEval_EvalFrameEx () from /lib64/libpython3.4m.so.1.0

output of pd.show_versions()

pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Darwin
OS-release: 14.4.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: None
pip: 8.1.2
setuptools: 23.0.0
Cython: None
numpy: 1.10.4
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.4
blosc: None
bottleneck: None
tables: 3.2.2
numexpr: 2.6.0
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: 0.6.7.None
psycopg2: None
Jinja2: None

The h5 data in my github
https://github.com/HaoXJ/codefail/blob/master/data/test.h5

@jreback
Copy link
Contributor

jreback commented Jun 15, 2016

pls show a minimal example that can be copy pasted, whether its the hdf5 reading or something else.

@jreback
Copy link
Contributor

jreback commented Jun 16, 2016

closing as not-reproducible. @HaoXJ if you post an example then pls comment.

@jreback jreback closed this as completed Jun 16, 2016
@HaoXJ
Copy link
Author

HaoXJ commented Jun 22, 2016

import sys
import pandas as pd
def series_rank_test(f):
    df = pd.read_hdf(f, key="bad_series")
    #df.rank()
    df.rank(ascending=False)
if __name__=="__main__":
    series_rank_test(sys.argv[1])

The data in https://github.com/HaoXJ/codefail/blob/master/data/test.h5

@jorisvandenbossche
Copy link
Member

I can reproduce this.

However, I can't trim it down to a simple example. If I recreate the Series from a dict (the result of to_dict on the h4 file):

from math import nan

s = pd.Series({'000022.XSHE': nan,
 '000089.XSHE': nan,
 '000099.XSHE': nan,
 '000429.XSHE': nan,
 '000507.XSHE': nan,
 '000520.XSHE': nan,
 '000548.XSHE': nan,
 '000828.XSHE': nan,
 '000900.XSHE': nan,
 '000905.XSHE': nan,
 '000916.XSHE': nan,
 '002040.XSHE': nan,
 '600004.XSHG': nan,
 '600009.XSHG': nan,
 '600012.XSHG': nan,
 '600020.XSHG': nan,
 '600026.XSHG': nan,
 '600033.XSHG': nan,
 '600035.XSHG': nan,
 '600057.XSHG': nan,
 '600077.XSHG': nan,
 '600106.XSHG': nan,
 '600115.XSHG': nan,
 '600119.XSHG': nan,
 '600125.XSHG': nan,
 '600153.XSHG': nan,
 '600180.XSHG': nan,
 '600190.XSHG': nan,
 '600221.XSHG': nan,
 '600269.XSHG': nan,
 '600270.XSHG': nan,
 '600317.XSHG': nan,
 '600350.XSHG': nan,
 '600368.XSHG': nan,
 '600377.XSHG': nan,
 '600428.XSHG': nan,
 '600548.XSHG': nan,
 '600561.XSHG': nan,
 '600575.XSHG': nan,
 '600611.XSHG': nan,
 '600650.XSHG': nan,
 '600662.XSHG': nan,
 '600676.XSHG': nan,
 '600692.XSHG': nan,
 '600717.XSHG': nan,
 '600751.XSHG': nan,
 '600787.XSHG': nan,
 '600794.XSHG': nan,
 '600798.XSHG': nan,
 '600834.XSHG': nan,
 '600896.XSHG': nan,
 '600897.XSHG': nan}, name="non_current_asset_to_total_asset")

 s.astype(object).rank(ascending=False)

This also crashes python for me. In any case, it has something to do with the fact it is object dtype, and not float (the read_hdf function returns such a series when all values are NaN).

@jorisvandenbossche jorisvandenbossche added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Jun 23, 2016
@HaoXJ
Copy link
Author

HaoXJ commented Jun 29, 2016

So ,Bug exist in serial.rank() return nan in my expect .
Thanks for the answer.

@HaoXJ HaoXJ closed this as completed Jun 29, 2016
@HaoXJ HaoXJ reopened this Jun 29, 2016
@jreback jreback added this to the Next Major Release milestone Jun 30, 2016
@jreback jreback modified the milestones: 0.19.0, Next Major Release Aug 16, 2016
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants