Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

mitigate child processes with 100% CPU and/or zombies #76

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mmitch
Copy link
Contributor

@mmitch mmitch commented Apr 30, 2014

This should work around the 100% CPU and/or zombie issues (for details see commit log).
It does not fix the root cause of the endless loop in the child processes, but it kill's them, so they don't live very long.

This is by no means good code, but a works-for-me with some ugly hacks.
Should probably be refined or tested by more people than just me before a merge :-)
(is there an experimental or -dev branch?)

While this does not fix the cause for child processes looping
endlessly in read()/EGAIN, it will detect these loops and kill the
runaway processes automatically.  It also prevents zombies after
killing these child processes (which happened before when you killed
them manually).

- Fix the assumption that an unresponsive child process has died.
  Until now these processes were just 'forgotten', but now it is
  checked whether the process is unresponsive but still alive (e.g. in
  an endless loop) and if alive, the process is killed before it is
  'forgotten'.

- Fix the reaping of dead child processes.  Until now, only a single
  waitpid() call was issued, which only repead one process, even if
  there were multiple processes waiting to be reaped.  Now a loop is
  used and all potential zombies should be reaped properly.
Switch to proposed 'variant 3':
Irssi::pidwait_add() already does a waitpid() for any child.
Manual calls to waitpid() and Irssi::pidwait_remove() should not be
necessary, so remove them altogether.

(I've grep(1)ed the irssi scripts directory on my Debian stable and
 none of the scripts (excepit twirssi :-) that calls pidwait_add()
 calls either pidwait_remove() or waitpid(), so this should work.)
@yarikoptic
Copy link

FWIW -- very looking forward to see the fix. @mmitch if you could share variant 3 patch I would be glad to try it out as well

@mmitch
Copy link
Contributor Author

mmitch commented May 4, 2014

@yarikoptic with commit mmitch@7e976ff variant 3 is active
It's working fine for me so far: 8 successfully killed runaway processes in the last 36 hours.

@sandebert
Copy link

I'm also affected by this bug. I'm trying the variant 3 by @mmitch now to see it that solves it for me.

@sandebert
Copy link

Reporting back: 16 runaway processes killed in the last 24 hours.

@jikuja
Copy link

jikuja commented Jul 19, 2018

Is twirssi now stable without this PR or is there something wrong with this? I have been running mmitch/twirssi@7e976ff since may 2015 without problems.

Now to get longer tweets I should update script but is it still creating runaway processes?

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants