Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Bluepill crashing when checking memory usage #37

Open
c-oreills opened this issue Sep 3, 2015 · 5 comments
Open

Bluepill crashing when checking memory usage #37

c-oreills opened this issue Sep 3, 2015 · 5 comments

Comments

@c-oreills
Copy link

About once a day, one of our webservers will have a failed deploy because its Bluepill daemon has crashed. Inspecting the latest logs, this is the error:

E, [2015-09-03T06:00:31.951188 #24114] ERROR -- : [web:web:web_0] ["/var/lib/gems/1.9.1/gems/bluepill-0.1.1/lib/bluepill/system.rb:227:in ``'", "/var/lib/gems/1.9.1/gems/bluepill-0.1.1/lib/bluepill/system.rb:227:in `ps_axu'", "/var/lib/gems/1.9.1/gems/bluepill-0.1.1/lib/bluepill/system.rb:42:in `memory_usage'", "/var/lib/gems/1.9.1/gems/bluepill-0.1.1/lib/bluepill/process_conditions/mem_usage.rb:15:in `run'", "/var/lib/gems/1.9.1/gems/bluepill-0.1.1/lib/bluepill/condition_watch.rb:28:in `run'", "/var/lib/gems/1.9.1/gems/bluepill-0.1.1/lib/bluepill/process.rb:221:in `block (2 levels) in run_watches'"]

Any idea what could be causing this? The memory usage is set to a fairly sane value:

process.checks :mem_usage, :every => 30.seconds, :below => 512.megabytes, :times => [3, 5]
@sferik
Copy link
Member

sferik commented Sep 3, 2015

Would you be willing to share the output of ps axo pid,ppid,pcpu,rss,etime,command on the system in which this is crashing? If it contains information that you’d rather not post publicly, you can send it to me via email: sferik@gmail.com.

@jameslegg
Copy link

Hi,

@c-oreills isn't around at the moment, but I've managed to capture output you where after on a failed box. However bluepill crashed about 10:30 last night and this was captured at about at 9:00 UTC.

A bluepill load of the original pill seems to work OK and bluepill finds the PID files and seem to regain control without any issues.

E, [2015-09-21T20:22:36.748604 #29769] ERROR -- : [web:web:web_0] ["/var/lib/gems/1.9.1/gems/bluepill-0.0.69/lib/bluepill/system.rb:227:in ``'", "/var/lib/gems/1.9.1/gems/bluepill-0.0.69/lib/bluepill/system.rb:227:in `ps_axu'", "/var/lib/gems/1.9.1/gems/bluepill-0.0.69/lib/bluepill/system.rb:42:in `memory_usage'", "/var/lib/gems/1.9.1/gems/bluepill-0.0.69/lib/bluepill/process_conditions/mem_usage.rb:15:in `run'", "/var/lib/gems/1.9.1/gems/bluepill-0.0.69/lib/bluepill/condition_watch.rb:29:in `run'", "/var/lib/gems/1.9.1/gems/bluepill-0.0.69/lib/bluepill/process.rb:221:in `block (2 levels) in run_watches'"]
 PID  PPID %CPU   RSS     ELAPSED COMMAND
    1     0  0.0  1720 46-15:43:58 /sbin/init
    2     0  0.0     0 46-15:43:58 [kthreadd]
    3     2  0.0     0 46-15:43:58 [ksoftirqd/0]
    5     2  0.0     0 46-15:43:58 [kworker/0:0H]
    7     2  0.0     0 46-15:43:58 [rcu_sched]
    8     2  0.1     0 46-15:43:58 [rcuos/0]
    9     2  0.0     0 46-15:43:58 [rcuos/1]
   10     2  0.0     0 46-15:43:58 [rcuos/2]
   11     2  0.0     0 46-15:43:58 [rcuos/3]
   12     2  0.0     0 46-15:43:58 [rcuos/4]
   13     2  0.0     0 46-15:43:58 [rcuos/5]
   14     2  0.0     0 46-15:43:58 [rcuos/6]
   15     2  0.0     0 46-15:43:58 [rcuos/7]
   16     2  0.0     0 46-15:43:58 [rcuos/8]
   17     2  0.0     0 46-15:43:58 [rcuos/9]
   18     2  0.0     0 46-15:43:58 [rcuos/10]
   19     2  0.0     0 46-15:43:58 [rcuos/11]
   20     2  0.0     0 46-15:43:58 [rcuos/12]
   21     2  0.0     0 46-15:43:58 [rcuos/13]
   22     2  0.0     0 46-15:43:58 [rcuos/14]
   23     2  0.0     0 46-15:43:58 [rcu_bh]
   24     2  0.0     0 46-15:43:58 [rcuob/0]
   25     2  0.0     0 46-15:43:58 [rcuob/1]
   26     2  0.0     0 46-15:43:58 [rcuob/2]
   27     2  0.0     0 46-15:43:58 [rcuob/3]
   28     2  0.0     0 46-15:43:58 [rcuob/4]
   29     2  0.0     0 46-15:43:58 [rcuob/5]
   30     2  0.0     0 46-15:43:58 [rcuob/6]
   31     2  0.0     0 46-15:43:58 [rcuob/7]
   32     2  0.0     0 46-15:43:58 [rcuob/8]
   33     2  0.0     0 46-15:43:58 [rcuob/9]
   34     2  0.0     0 46-15:43:58 [rcuob/10]
   35     2  0.0     0 46-15:43:58 [rcuob/11]
   36     2  0.0     0 46-15:43:58 [rcuob/12]
   37     2  0.0     0 46-15:43:58 [rcuob/13]
   38     2  0.0     0 46-15:43:58 [rcuob/14]
   39     2  0.0     0 46-15:43:58 [migration/0]
   40     2  0.0     0 46-15:43:58 [watchdog/0]
   41     2  0.0     0 46-15:43:58 [watchdog/1]
   42     2  0.0     0 46-15:43:58 [migration/1]
   43     2  0.0     0 46-15:43:58 [ksoftirqd/1]
   45     2  0.0     0 46-15:43:58 [kworker/1:0H]
   46     2  0.0     0 46-15:43:58 [khelper]
   47     2  0.0     0 46-15:43:58 [kdevtmpfs]
   48     2  0.0     0 46-15:43:58 [netns]
   49     2  0.0     0 46-15:43:58 [xenwatch]
   50     2  0.0     0 46-15:43:58 [xenbus]
   52     2  0.0     0 46-15:43:58 [writeback]
   53     2  0.0     0 46-15:43:58 [kintegrityd]
   54     2  0.0     0 46-15:43:58 [bioset]
   55     2  0.0     0 46-15:43:58 [kworker/u31:0]
   56     2  0.0     0 46-15:43:58 [kblockd]
   58     2  0.0     0 46-15:43:58 [ata_sff]
   59     2  0.0     0 46-15:43:58 [khubd]
   60     2  0.0     0 46-15:43:58 [md]
   61     2  0.0     0 46-15:43:58 [devfreq_wq]
   63     2  0.0     0 46-15:43:58 [khungtaskd]
   64     2  0.0     0 46-15:43:58 [kswapd0]
   65     2  0.0     0 46-15:43:58 [ksmd]
   66     2  0.0     0 46-15:43:58 [khugepaged]
   67     2  0.0     0 46-15:43:58 [fsnotify_mark]
   68     2  0.0     0 46-15:43:58 [ecryptfs-kthrea]
   69     2  0.0     0 46-15:43:58 [crypto]
   81     2  0.0     0 46-15:43:58 [kthrotld]
   83     2  0.0     0 46-15:43:58 [scsi_eh_0]
   84     2  0.0     0 46-15:43:58 [scsi_eh_1]
  105     2  0.0     0 46-15:43:57 [deferwq]
  106     2  0.0     0 46-15:43:57 [charger_manager]
  148     2  0.0     0 46-15:43:57 [kpsmoused]
  210     2  0.0     0 46-15:43:55 [jbd2/xvda1-8]
  211     2  0.0     0 46-15:43:55 [ext4-rsv-conver]
  339     2  0.0     0 46-15:43:48 [jbd2/xvdb-8]
  340     2  0.0     0 46-15:43:48 [ext4-rsv-conver]
  427     1  0.0   372 46-15:43:41 upstart-udev-bridge --daemon
  430     1  0.0   892 46-15:43:41 /lib/systemd/systemd-udevd --daemon
  598     1  0.0   480 46-15:43:39 upstart-socket-bridge --daemon
  664     1  0.0  2568 46-15:43:39 dhclient -1 -v -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
  842     1  0.0 44192 46-15:43:37 /opt/chef/embedded/bin/ruby /usr/bin/chef-client -i 3600 -L /var/log/chef/client.log
  924     1  0.0   448 46-15:43:37 upstart-file-bridge --daemon
  929     1  0.0   716 46-15:43:37 dbus-daemon --system --fork
  940     1  0.0  1136 46-15:43:37 /lib/systemd/systemd-logind
  942     2  0.0     0 46-15:43:37 [kworker/u31:1]
  965     1  0.0 13228 46-15:43:37 rsyslogd
 1073     1  0.0   164 46-15:43:36 /sbin/getty -8 38400 tty4
 1076     1  0.0   168 46-15:43:36 /sbin/getty -8 38400 tty5
 1085     1  0.0   160 46-15:43:36 /sbin/getty -8 38400 tty2
 1086     1  0.0   164 46-15:43:36 /sbin/getty -8 38400 tty3
 1088     1  0.0   164 46-15:43:36 /sbin/getty -8 38400 tty6
 1120     1  0.0   896 46-15:43:36 /usr/sbin/sshd -D
 1141     1  0.0   408 46-15:43:36 cron
 1142     1  0.0   172 46-15:43:36 atd
 1150     1  0.0   140 46-15:43:36 acpid -c /etc/acpi/events -s /var/run/acpid.socket
 1379     1  0.0   160 46-15:43:25 /sbin/getty -8 38400 tty1
 1380     1  0.0   152 46-15:43:25 /sbin/getty -8 38400 ttyS0
 1708     2  0.0     0  3-23:43:58 [kworker/0:1]
 1809     2  0.0     0    23:24:31 [kworker/1:2]
 2083     2  0.0     0 46-15:42:25 [kauditd]
 5774     1  0.0  1276 46-15:38:27 nginx: master process /usr/sbin/nginx
 6065     1  0.0  1056 46-15:38:00 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 106:111
 6113  5774  0.1  3096 46-15:37:58 nginx: worker process
 6114  5774  0.1  2844 46-15:37:58 nginx: worker process
 6242     1  0.1 47700 46-15:37:54 /opt/sensu/embedded/bin/ruby /opt/sensu/bin/sensu-client -b -c /etc/sensu/config.json -d /etc/sens
 9013 15955  2.2 340204   10:29:52 gunicorn: worker [gunicorn]
 9133 15955  1.4 303088   07:47:27 gunicorn: worker [gunicorn]
 9199 15955  1.6 322568   06:28:25 gunicorn: worker [gunicorn]
 9224 15955  1.5 275880   06:27:48 gunicorn: worker [gunicorn]
 9237 15955  1.4 274144   06:17:15 gunicorn: worker [gunicorn]
 9276 15955  1.1 193520   06:06:20 gunicorn: worker [gunicorn]
 9285 15955  1.7 273680   06:00:48 gunicorn: worker [gunicorn]
 9297 15955  2.1 315720   05:59:28 gunicorn: worker [gunicorn]
 9369 15955  1.7 212272   03:57:15 gunicorn: worker [gunicorn]
 9659     2  0.0     0    01:15:08 [kworker/u30:1]
 9687     2  0.0     0       38:34 [kworker/u30:2]
10102  1120  0.0  4240       03:49 sshd: jameslegg [priv]
10121     2  0.0     0       03:48 [kworker/1:0]
10156 10102  0.0  1876       03:48 sshd: jameslegg@pts/0
10157 10156  0.0  4992       03:48 -bash
10195  1120  0.0  4232       01:08 sshd: jameslegg [priv]
10248 10195  0.0  1876       01:07 sshd: jameslegg@pts/4
10249 10248  0.0  4852       01:07 -bash
10269 10157  0.0   888       00:00 ps axo pid,ppid,pcpu,rss,etime,command
13119 15955  2.3 360008   20:31:40 gunicorn: worker [gunicorn]
13120 15955  2.6 334120   20:31:40 gunicorn: worker [gunicorn]
13806     2  0.0     0  4-22:33:55 [kworker/1:1]
15955     1  0.0 10540 46-00:19:45 gunicorn: master [gunicorn]
17613     1  0.1   708  7-17:24:50 SCREEN
17614 17613  0.0  2944  7-17:24:50 /bin/bash
26686     2  0.0     0 18-21:23:43 [kworker/0:0]

Let me know if there is anything else I can provide or do to debug this.

Thanks
James

@akzhan
Copy link
Contributor

akzhan commented Jun 19, 2016

it looks like ps axo pid,ppid,pcpu,rss,etime,command cannot be executed under bluepill user. Check your OS settings.

@dansajner
Copy link

Hi. We started seeing this issue this past week following upgrade from ruby2.0 to ruby2.2.1. The failures seem to line up with our daily bluepill.log logrotate. We're only seeing this on 1 or 2 servers per night out of many so its not 100% repeatable. The exception is the same as originally reported.

@idyll
Copy link

idyll commented Jun 16, 2017

I experienced this issue today. For me it was path. I have to specify the FULL PATH to ps aka /bin/ps for it to succeed.

This makes pretty much no sense to me.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants