Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Irssi uses 100% CPU time #75

Open
Crazyguy opened this issue Apr 28, 2014 · 9 comments
Open

Irssi uses 100% CPU time #75

Crazyguy opened this issue Apr 28, 2014 · 9 comments

Comments

@Crazyguy
Copy link

Hello,

I have this issue where Irssi starts using 100% CPU. The time seems to correlate with the following messages from the Twirssi debug log:

03:21:08 'timeline' => '460572445892571136'03:21:08 }
03:21:08 };
03:26:03 get_updates starting upd=1
03:26:03 bg_setup starting upd=1
03:26:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 0
03:26:13 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:18 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:23 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:28 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:33 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:38 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:43 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:48 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:53 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:26:58 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:03 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:13 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:18 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:23 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:28 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:33 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:38 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:43 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:48 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:53 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:27:58 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:28:03 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:28:08 checking child log at /tmp/tw_9366_Kbf6.done mtime=1398644763 v 1398644763
03:28:08 Giving up on polling /tmp/tw_9366_Kbf6.done
03:31:03 get_updates starting upd=1
03:31:03 bg_setup starting upd=1
03:31:08 checking child log at /tmp/tw_9366_c3jr.done mtime=1398645063 v 0
03:31:13 checking child log at /tmp/tw_9366_c3jr.done no /tmp/tw_9366_c3jr.19390 v 1398645063
03:31:13 +063 teromiettinen@Twitter Loading home_timeline page 1
03:31:13 +064 page: $VAR1 = {};

@gedge
Copy link
Collaborator

gedge commented Apr 28, 2014

How long have you been running twirssi, and how long has this problem been evident?

(heard a few, recent reports from others, on this subject)

@gedge
Copy link
Collaborator

gedge commented Apr 28, 2014

are you on debian?

@Crazyguy
Copy link
Author

The problem has occurred for maybe a year (I have been using Twirssi for about 3 years). The issue starts after anywhere from a day to a month of Irssi uptime. I'm not sure if computer uptime makes a difference.

Yes, I've been using Debian stable.

@mmitch
Copy link
Contributor

mmitch commented Apr 28, 2014

Same here:

  • first occurence of the problem could be a year ago
  • problem occurs randomly - sometimes after weeks of irssi uptime, sometimes after a day. Lately, more a day than a month that's why I'm starting to look into the issue :-)
  • Debian stable

I'm currently checking two things (at once :-)

  • put the waitpid( -1, WNOHANG) into a loop like seen on perldoc -f waitpid – if there are multiple dead childs, only one will be reaped at a time
  • remove the calls to Irssi::pidwait_remove() because the irssi documentation makes it sound like without this call children will be reaped automatically (quote from http://www.irssi.org/documentation/perl : it is usually not necessary to call this function

@mmitch
Copy link
Contributor

mmitch commented Apr 28, 2014

Additional info: The child process seems to be stuck in an endless loop talking to twitter:

% strace -p 26734 2>&1 | head -10

Process 26734 attached - interrupt to quit
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)
read(13, 0xeeee83b, 5) = -1 EAGAIN (Resource temporarily unavailable)

% lsof -n -p 26734
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
[...]
irssi 26734 mitch 11u REG 0,18 0 676486626 /tmp/tw_8171_lX8K.26734
irssi 26734 mitch 12u IPv4 676487523 0t0 TCP 87.175.121.166:34436->199.16.156.199:https (CLOSE_WAIT)
irssi 26734 mitch 13u IPv4 676488332 0t0 TCP 87.175.121.166:34465->199.16.156.199:https (ESTABLISHED)
[...]

% dig -x 199.16.156.199 +nocomment

; <<>> DiG 9.8.4-rpz2+rl005.12-P1 <<>> -x 199.16.156.199 +nocomment
;; global options: +cmd
;199.156.16.199.in-addr.arpa. IN PTR
156.16.199.in-addr.arpa. 12 IN SOA ns1.p34.dynect.net. root.twitter.com. 7 3600 600 604800 60
;; Query time: 0 msec
;; SERVER: 192.168.0.105#53(192.168.0.105)
;; WHEN: Fri Apr 25 19:37:25 2014
;; MSG SIZE rcvd: 115

@Crazyguy
Copy link
Author

Is it bad if I just kill the process as a workaround?

@mmitch
Copy link
Contributor

mmitch commented Apr 30, 2014

You can safely kill the processes, but you might accumulate zombies that way. At least the 100% CPU should be gone. A workaround is incoming :-)

@mmitch
Copy link
Contributor

mmitch commented Apr 30, 2014

For my workaround see #76
The patch should be verified and tested - on my box, it has already successfully prevented two runaway child processes today. This looks something like this:

10:42 ERROR: Giving up on polling=/tmp/tw_19010_7yvj.done child_pid=28336 parent_pid=19010
10:42 ERROR: remove_child: child was still running, killing it now! child_pid=28336 parent_pid=19010
10:42 ERROR: Haven't been able to get updated tweets since 10:35
10:47 UPDATE: Update succeeded.

@mmitch
Copy link
Contributor

mmitch commented Apr 30, 2014

The root cause of the 100% CPU load seems to be a bug somwhere in the LWP modules, see https://rt.cpan.org/Public/Bug/Display.html?id=78920

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants