Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

vmprof causes process to hang #116

Closed
jogo opened this issue Feb 16, 2017 · 9 comments
Closed

vmprof causes process to hang #116

jogo opened this issue Feb 16, 2017 · 9 comments

Comments

@jogo
Copy link
Contributor

jogo commented Feb 16, 2017

When enabling vmprof in a large gevent based server process, it outputs data for a few minutes then causes everything to hang.

when attaching gdb to the process and looking at the backtrace I get the following (confirmed I saw the same trace twice):

#1  0x000000000054a8f6 in find_key.47505 ()
#2  0x000000000054ad3a in PyGILState_GetThisThreadState ()
#3  0x00002b268fa19574 in sigprof_handler (sig_nr=<optimized out>, info=<optimized out>, ucontext=<optimized out>) at src/vmprof_main.h:170
#4  <signal handler called>
#5  0x000000000054a95d in find_key.47505 ()
#6  0x00000000004f3f30 in PyGILState_Release ()
#7  0x00002b26913f0f97 in ev_invoke_pending (loop=0x2b2691644620) at libev/ev.c:3155
#8  0x00002b26914206f8 in ev_run (loop=0x2b2691644620, flags=flags@entry=0) at libev/ev.c:3555
#9  0x00002b2691420fa4 in __pyx_pf_6gevent_8corecext_4loop_14run (__pyx_v_self=0x3abcc20, __pyx_v_once=False, __pyx_v_nowait=<optimized out>)
    at gevent/gevent.corecext.c:7017
#10 __pyx_pw_6gevent_8corecext_4loop_15run (__pyx_v_self=<gevent.corecext.loop at remote 0x3abcc20>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>)
    at gevent/gevent.corecext.c:6884
#11 0x0000000000536f94 in PyEval_EvalFrameEx ()
#12 0x00002b268fa189da in vmprof_eval (f=<optimized out>, throwflag=<optimized out>) at src/_vmprof.c:55
#13 0x000000000053e2b0 in PyEval_EvalCodeEx ()
#14 0x000000000051d0a0 in function_call ()
#15 0x00000000004c2646 in PyObject_Call ()
#16 0x0000000000531dea in instancemethod_call.8548 ()
#17 0x00000000004c2646 in PyObject_Call ()
#18 0x00000000004c2aa6 in PyEval_CallObjectWithKeywords ()
#19 0x00002b26911e0b2d in g_initialstub (mark=<optimized out>) at greenlet.c:798
#20 0x00002b26911e0480 in g_switch (target=0x3629370, args=(), kwargs=0x0) at greenlet.c:570
#21 0x00000000004b39f7 in PyObject_CallFunctionObjArgs ()
#22 0x00007ffdde724f61 in ?? ()
#23 0x0000000000000008 in ?? ()
#24 0x000000000050b5a7 in PyString_FromString ()
#25 0x00000000004ef3b4 in import_submodule.39103 ()
#26 0x000000001212c3b0 in ?? ()
#27 0x0000000012132990 in ?? ()
#28 0x000000000051d030 in ?? ()
#29 0x000000001214bb18 in ?? ()
#30 0x0000000012197ae8 in ?? ()
#31 0x0000000000000001 in ?? ()
#32 0x0000000012197b50 in ?? ()
#33 0x00007ffdde726540 in ?? ()
#34 0x000000000051d0a0 in function_call ()
#35 0x00000000004c2646 in PyObject_Call ()
#36 0x0000000000531e06 in instancemethod_call.8548 ()
#37 0x00007ffdde725360 in ?? ()
#38 0x00007ffdde725290 in ?? ()
#39 0x0000000000553507 in tupledealloc.24200 ()
#40 0x000000000089f3e0 in ?? ()
#41 0x00007ffdde7275d0 in ?? ()
#42 0x000000000089f3e0 in ?? ()
#43 0x000000001d0652b0 in ?? ()
#44 0x0000000001fe17f0 in ?? ()
#45 0x0000000000000001 in ?? ()
#46 0x0000000000561ed2 in PyErr_Format ()
#47 0x0000000000484356 in find_module.38892 ()
#48 0x00000000004ef3b4 in import_submodule.39103 ()
#49 0x000000001d1565cf in ?? ()
#50 0x00007ffdde7286b8 in ?? ()
#51 0x000000000050ed81 in PyString_FromFormatV ()
#52 0x0000000000561ed2 in PyErr_Format ()
#53 0x00000000008a9860 in ?? ()
#54 0x00000000ffffffff in ?? ()
#55 0x00000000ffffffff in ?? ()
#56 0x0000000002d6fc36 in ?? ()
#57 0x000000001d2613c0 in ?? ()
#58 0x00000000008a9860 in ?? ()
#59 0x00000000ffffffff in ?? ()
#60 0x00007ffdde728810 in ?? ()
#61 0x00000000008a9860 in ?? ()
#62 0x000000001d2f48dc in ?? ()
#63 0x00000000004efd54 in import_module_level.isra.3.39129 ()
#64 0x0000000000000000 in ?? ()

I am not sure what other information would be useful to debug this issue.

@jogo
Copy link
Contributor Author

jogo commented Feb 16, 2017

Besides this issue, I have found vmprof to be very useful and look forward to using it more in the future, thanks for such as great tool!

@planrich
Copy link
Contributor

Can you post some more information about your system? If have seen this issue once, but could not reproduce it. PyGILState_GetThisThreadState seems not to be able to acquire the lock and hangs forever.

@jogo
Copy link
Contributor Author

jogo commented Feb 16, 2017

Python version: python 2.7.3
OS: Ubuntu 12.04 with 4.2.0 kernel
gevent version: 1.1.2
server: based on http://www.gevent.org/gevent.pywsgi.html

I am trying to get our system running under python-dbg so I can hopefully get more information about about what is running when it hangs forever. After that I can try to create a self contained code base that can reliably cause the issue.

@planrich is there any other information that you want?

@jogo
Copy link
Contributor Author

jogo commented Feb 16, 2017

got python with debug information working:

(gdb) bt
#0  0x00002ba98b6e8fd0 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00000000004f3238 in PyThread_acquire_lock (lock=0x17a8460, waitflag=1) at Python/thread_pthread.h:321
#2  0x00000000004f340c in find_key (key=1, value=0x0) at Python/thread.c:268
#3  0x00000000004f363e in PyThread_get_key_value (key=1) at Python/thread.c:360
#4  0x00000000004dca78 in PyGILState_GetThisThreadState () at Python/pystate.c:581
#5  0x00002ba9b3bf8574 in sigprof_handler (sig_nr=<optimized out>, info=<optimized out>, ucontext=<optimized out>) at src/vmprof_main.h:170
#6  <signal handler called>
#7  PyThread_release_lock (lock=0x17a8460) at Python/thread_pthread.h:342
#8  0x00000000004f3516 in find_key (key=1, value=0x0) at Python/thread.c:296
#9  0x00000000004f363e in PyThread_get_key_value (key=1) at Python/thread.c:360
#10 0x00000000004dcb56 in PyGILState_Release (oldstate=PyGILState_UNLOCKED) at Python/pystate.c:622
#11 0x00002ba98e59af97 in ev_invoke_pending (loop=0x2ba98e7ee620) at libev/ev.c:3155
#12 0x00002ba98e5ca6f8 in ev_run (loop=0x2ba98e7ee620, flags=flags@entry=0) at libev/ev.c:3555
#13 0x00002ba98e5cafa4 in __pyx_pf_6gevent_8corecext_4loop_14run (__pyx_v_self=0x2c6af80, __pyx_v_once=False, __pyx_v_nowait=<optimized out>) at gevent/gevent.corecext.c:7017
#14 __pyx_pw_6gevent_8corecext_4loop_15run (__pyx_v_self=<gevent.corecext.loop at remote 0x2c6af80>, __pyx_args=<optimized out>, __pyx_kwds=<optimized out>) at gevent/gevent.corecext.c:6884
#15 0x000000000053241b in PyCFunction_Call (func=<built-in method run of gevent.corecext.loop object at remote 0x2c6af80>, arg=(), kw=0x0) at Objects/methodobject.c:85
#16 0x00000000004b491c in call_function (pp_stack=0x7ffc3fd584e8, oparg=0) at Python/ceval.c:4021
#17 0x00000000004b0d6c in PyEval_EvalFrameEx (f=
    Frame 0x2e298b0, for file /usr/local/lib/python2.7/site-packages/gevent/hub.py, line 667, in run (self=<PatchedHub(_resolver=<Resolver(pool=<ThreadPool(hub=<...>, _lock=<thread.lock at remote 0x2c5c590>, pid=15388, fork_watcher=<gevent.corecext.fork at remote 0x2cc6b48>, manager=None, task_queue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x2d2c910>, maxsize=0, all_tasks_done=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39e50>, mutex=<thread.lock at remote 0x2c5ccb0>, not_full=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39d50>, not_empty=<Condition(release=<built-in method relea...(truncated), throwflag=0) at Python/ceval.c:2666
#18 0x00000000004b29f9 in PyEval_EvalCodeEx (co=0x2ba98b671c30, globals=
    {'get_hub_class': <function at remote 0x2ba98b67d7d0>, '_NONE': <_NONE at remote 0x2ba98b52c0b0>, '_DEFAULT_LOOP_CLASS': 'gevent.core.loop', 'xrange': <type at remote 0x7b5420>, 'PY2': True, 'PY3': False, 'MAIN_THREAD': 48007186633408, 'absolute_import': <_Feature(mandatory=(2, 7, 0, 'alpha', 0), optional=(2, 5, 0, 'alpha', 1), compiler_flag=16384) at remote 0x2ba98b6643f8>, 'resolver_config': <function at remote 0x2ba98b67daa0>, 'threadlocal': <type at remote 0x7d27e0>, 'InvalidSwitchError': <type at remote 0x18d2380>, 'sleep': <function at remote 0x2ba98b67d320>, 'reraise': <function at remote 0x2ba98b67d140>, 'set_hub': <function at remote 0x2ba98b67d938>, 'LoopExit': <type at remote 0x18d1640>, '_MultipleWaiter': <type at remote 0x18ddf90>, 'GreenletExit': <type at remote 0x18cf570>, '__all__': ['getcurrent', 'GreenletExit', 'spawn_raw', 'sleep', 'kill', 'signal', 'reinit', 'get_hub', 'Hub', 'Waiter'], '_get_hub': <function at remote 0x2ba98b67d8c0>, '__package__': None, 'Waiter': <type at remote 0x18dce7...(truncated), locals=0x0, args=0x272a568, argcount=1, kws=0x0, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3253
#19 0x000000000052ff29 in function_call (func=<function at remote 0x2ba98b6811b8>, arg=
    (<PatchedHub(_resolver=<Resolver(pool=<ThreadPool(hub=<...>, _lock=<thread.lock at remote 0x2c5c590>, pid=15388, fork_watcher=<gevent.corecext.fork at remote 0x2cc6b48>, manager=None, task_queue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x2d2c910>, maxsize=0, all_tasks_done=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock o
---Type <return> to continue, or q <return> to quit---
bject at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39e50>, mutex=<thread.lock at remote 0x2c5ccb0>, not_full=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39d50>, not_empty=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acq...(truncated), kw=0x0) at Objects/funcobject.c:526
#20 0x000000000041f1d3 in PyObject_Call (func=<function at remote 0x2ba98b6811b8>, arg=
    (<PatchedHub(_resolver=<Resolver(pool=<ThreadPool(hub=<...>, _lock=<thread.lock at remote 0x2c5c590>, pid=15388, fork_watcher=<gevent.corecext.fork at remote 0x2cc6b48>, manager=None, task_queue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x2d2c910>, maxsize=0, all_tasks_done=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39e50>, mutex=<thread.lock at remote 0x2c5ccb0>, not_full=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39d50>, not_empty=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acq...(truncated), kw=0x0) at Objects/abstract.c:2529
#21 0x0000000000427b82 in instancemethod_call (func=<function at remote 0x2ba98b6811b8>, arg=
    (<PatchedHub(_resolver=<Resolver(pool=<ThreadPool(hub=<...>, _lock=<thread.lock at remote 0x2c5c590>, pid=15388, fork_watcher=<gevent.corecext.fork at remote 0x2cc6b48>, manager=None, task_queue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x2d2c910>, maxsize=0, all_tasks_done=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39e50>, mutex=<thread.lock at remote 0x2c5ccb0>, not_full=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39d50>, not_empty=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acq...(truncated), kw=0x0) at Objects/classobject.c:2578
#22 0x000000000041f1d3 in PyObject_Call (func=<instancemethod at remote 0x2ccc1e0>, arg=(), kw=0x0) at Objects/abstract.c:2529
#23 0x00000000004b41d9 in PyEval_CallObjectWithKeywords (func=<instancemethod at remote 0x2ccc1e0>, arg=(), kw=0x0) at Python/ceval.c:3890
#24 0x00002ba98e38a9fd in g_initialstub (mark=<optimized out>) at greenlet.c:798
#25 0x00002ba98e38a360 in g_switch (target=0x2cb87d0, args=(), kwargs=0x0) at greenlet.c:570
#26 0x000000001bf2dc70 in ?? ()
#27 0x00000000007dd910 in ?? ()
#28 0x0000000000464461 in string_item (a=0x1bfb37d8, i=24806496) at Objects/stringobject.c:1184
#29 0x000000000052887f in classmethod_get (descr=0x1bf2dc70, obj=<unknown at remote 0x1bfb37d8>, type=<unknown at remote 0x2007dd910>) at Objects/descrobject.c:116
#30 0x0000000000000001 in ?? ()
#31 0x00007ffc3fd58ca0 in ?? ()
#32 0x00000000004547a9 in PyObject_RichCompareBool (v=<unknown at remote 0x48ce894800000000>, w=<unknown at remote 0x-4527b274b7dfbf75>, op=1223181707) at Objects/object.c:999
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

(gdb) py-bt
#17 Frame 0x2e298b0, for file /usr/local/lib/python2.7/site-packages/gevent/hub.py, line 667, in run (self=<PatchedHub(_resolver=<Resolver(pool=<ThreadPool(hub=<...>, _lock=<thread.lock at remote 0x2c5c590>, pid=15388, fork_watcher=<gevent.corecext.fork at remote 0x2cc6b48>, manager=None, task_queue=<Queue(unfinished_tasks=0, queue=<collections.deque at remote 0x2d2c910>, maxsize=0, all_tasks_done=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39e50>, mutex=<thread.lock at remote 0x2c5ccb0>, not_full=<Condition(release=<built-in method release of thread.lock object at remote 0x2c5ccb0>, _Condition__lock=<thread.lock at remote 0x2c5ccb0>, acquire=<built-in method acquire of thread.lock object at remote 0x2c5ccb0>, _Condition__waiters=[]) at remote 0x2d39d50>, not_empty=<Condition(release=<built-in method relea...(truncated)
    loop.run()
(gdb) py-list
 662            assert self is getcurrent(), 'Do not call Hub.run() directly'
 663            while True:
 664                loop = self.loop
 665                loop.error_handler = self
 666                try:
>667                    loop.run()
 668                finally:
 669                    loop.error_handler = None  # break the refcount cycle
 670                self.parent.throw(LoopExit('This operation would block forever', self))
 671            # this function must never return, as it will cause switch() in the parent greenlet
 672            # to return an unexpected value
0x00002ba98b6e8fd0 in sem_wait () from /lib/x86_64-linux-gnu/libpthread.so.0
(gdb) list
14               * Python requires non-stop mode.  Alas, some platforms enable FP
15               * exceptions by default.  Here we disable them.
16               */
17      #ifdef __FreeBSD__
18              fp_except_t m;
19
20              m = fpgetmask();
21              fpsetmask(m & ~FP_X_OFL);
22      #endif
23              return Py_Main(argc, argv);

Also our process is single threaded.

@planrich
Copy link
Contributor

Thanks, as I see things now the reason for this is the following:

PyGILState_GetThisThreadState looks up the thread local state which uses a mutex (keymutex in find_key, Python/thread.c). Assuming the following (as shown in the gdb stack trace in your first post):

PyGILState_Release also calls find_key and the signal occurs just after the keymutex has been acuqired. The signal in PyGILState_GetThisThreadState (which also tries to acquire keymutex) will hang, forever.

This could have happened most of the previous releases already. Thinking about a fix for that.

@planrich
Copy link
Contributor

so good news, I have worked around the issue described above and uploaded a pre release (version 0.4.2.dev1). Could you install that version an try again? pip install vmprof --pre should do

@jogo
Copy link
Contributor Author

jogo commented Feb 17, 2017

@planrich thanks, I tested out 0.4.2.dev1 and its now working!

@jogo
Copy link
Contributor Author

jogo commented Feb 23, 2017

Fix has been released in 0.4.2, closing

@jogo jogo closed this as completed Feb 23, 2017
@planrich
Copy link
Contributor

Released a fix, closing this issue.

@planrich planrich reopened this Feb 24, 2017
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants