Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

lx: alpine zone can't shutdown ("poweroff") due to "Operation not permitted" error #1119

Open
peterkelm opened this issue Jan 13, 2025 · 19 comments

Comments

@peterkelm
Copy link

peterkelm commented Jan 13, 2025

We have been experimenting with Alpine Linux LX images over a longer period of time. Since Alpine Linux version 3.13 we've seen issues on shutdown where the "poweroff" command yields an "Operation not permitted" error.

alpine:~# poweroff 
poweroff: Operation not permitted
alpine:~# strace -s 65535 -f poweroff
execve("/sbin/poweroff", ["poweroff"], 0x7fffffeffc58 /* 13 vars */) = 0
arch_prctl(ARCH_SET_FS, 0x7fffef391b28) = 0
set_tid_address(0x7fffef391f90)         = 18335
brk(NULL)                               = 0x8000
brk(0xa000)                             = 0xa000
mmap(0x8000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x8000
mprotect(0x7fffef38e000, 4096, PROT_READ) = 0
mprotect(0x7fffef456000, 16384, PROT_READ) = 0
getuid()                                = 0
nanosleep({tv_sec=0, tv_nsec=0}, 0x7fffffeffb50) = 0
uname({sysname="Linux", nodename="alpine", ...}) = 0
socket(AF_UNIX, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_UNIX, sun_path="/run/utmps/.wtmpd-socket"}, 110) = -1 ENOENT (No such file or directory)
close(3)                                = 0
sync()                                  = 0
kill(1, SIGUSR2)                        = -1 EPERM (Operation not permitted)
access("/proc/meminfo", F_OK)           = 0
write(2, "poweroff: Operation not permitted\n", 34poweroff: Operation not permitted
) = 34
exit_group(1 <unfinished ...>
+++ exited with 1 +++
alpine:~# 

#969 is most likely a related issue.

We are running smartos 20250109T000743Z on that machine.

I am not sure where to look to get this fixed… any pointers?

@bahamat
Copy link
Contributor

bahamat commented Jan 13, 2025

Which alpine image are you using?

@peterkelm
Copy link
Author

@bahamat: The latest ones offered by imgadm. The strace output in my initial report is from a machine that uses image "632a25ad-15dc-42f0-a23b-743b37f62cbb".

@danmcd
Copy link
Contributor

danmcd commented Jan 13, 2025

This is where people should look, I think?

kill(1, SIGUSR2) = -1 EPERM (Operation not permitted)

A DTrace in-kernel to figure out why SIGUSR2 fails would be illuminating (assuming, of course, this call makes it into the kernel and isn't stuffed somewhere in whatever libc Alpine uses... uggh).

@bahamat
Copy link
Contributor

bahamat commented Jan 13, 2025

@peterkelm Can you show me the output of this command?

/native/usr/bin/openssl sha1 /sbin/shutdown

@peterkelm
Copy link
Author

@bahamat:

SHA1(/sbin/shutdown)= c37aa000a6fdc0f71af50a5b4c4a407fe34d911e

@peterkelm
Copy link
Author

Just in case: "native" Alpine Linux only has "reboot", "halt" and "poweroff" but no "shutdown". For the LX zones:

  • "reboot" works as expected
  • "halt" doesn't report an error but also doesn't stop the zone either… not sure whether it should.
  • "poweroff" throws the error I mentioned already before.

@danmcd
Copy link
Contributor

danmcd commented Jan 13, 2025

I've reproduced this. And I have a lead.

Does this misbehavior happen with 3.12, or only 3.13 and later?

@danmcd
Copy link
Contributor

danmcd commented Jan 13, 2025

I ask because something changed where the init(8) process in alpine no longer receives signals for SIGUSR[12]. I want to know which versions of Alpine work and don't; it'll help determine what's wrong and where.

@danmcd
Copy link
Contributor

danmcd commented Jan 13, 2025

ALSO, "poweroff" is a native tool now. The reboot and halt commands are overwritten to use /native/usr/bin/pkill as appropriate (part of our image construction). Still... why busybox's init doesn't signal-handle what it's supposed to anymore is puzzling.

@peterkelm
Copy link
Author

@danmcd, thanks for the followup. From what I recall, 3.13 was the last version that worked. The next one we tried was 3.18, though… am going to reconfirm that tomorrow.

Also, there aren't many commits to this part of busybox in a long time - but this one might be "the one".
https://git.busybox.net/busybox/commit/init/init.c?id=f5e8b4278822f2413bf7e47466f55cc1a0fcca9a

Lastly, I've tried to look into dtrace but am pretty much a novice with that tool. So I haven't come very far with it yet. :-(

The "poweroff" code seems to essentially call "kill(1, SIGUSR2)" - and that gives the same "Operation not permitted" result. Well, at least that's consistent with what "poweroff" does...

@danmcd
Copy link
Contributor

danmcd commented Jan 13, 2025

Also, there aren't many commits to this part of busybox in a long time - but this one might be "the one". https://git.busybox.net/busybox/commit/init/init.c?id=f5e8b4278822f2413bf7e47466f55cc1a0fcca9a

That is interesting... it's almost as if the restoration block never gets reached in LX launches of this. I wonder why?

@peterkelm
Copy link
Author

Quick update before I fall asleep:

  • image 19aa3328-0025-11e7-a19a-c39077bfd4cf is Alpine Linux 3.5.2 and "poweroff" works as expected
  • image b5d89e30-128d-4ea4-824f-8e36fd7e3703 is Alpine Linux 3.18.2 and "poweroff" fails with that permission error

Unfortunately there aren't any other "official" alpine LX images in-between...

@peterkelm
Copy link
Author

To narrow this down further I replaced the busybox version in my Alpine Linux 3.18.2 "LX test zone" with those from older Alpine Linux releases (downloaded from their respective repo, force stopped the LX zone, replaced busybox, restarted the zone):

  • Busybox 1.32.1-r0 (as shipped with Alpine Linux 3.13) and newer versions report "Operation not permitted" on "poweroff"
  • Busybox 1.31.1-r19 (from Alpine Linux 3.12) does not report that error - not a surprise because the newer version introduced those signal changes. However, although the LX zone reports "Successfully completed stop for VM 37fbbd20-a490-42de-ac9b-cb46edf1b132" after a "poweroff", the zone remains in "running" state (per vmadm list)…

@danmcd
Copy link
Contributor

danmcd commented Jan 14, 2025

Busybox 1.31.1-r19 (from Alpine Linux 3.12) does not report that error - not a surprise because the newer version
introduced those signal changes. However, although the LX zone reports "Successfully completed stop for VM
37fbbd20-a490-42de-ac9b-cb46edf1b132" after a "poweroff", the zone remains in "running" state (per vmadm list)…

Interesting about the zone state. Does pgrep -z 37fbbd20-a490-42de-ac9b-cb46edf1b132 show any running processes? If so I'd be interested in:

pargs `pgrep -z 37fbbd20-a490-42de-ac9b-cb46edf1b132`

@peterkelm
Copy link
Author

My bad, that "zone state still running" comment wasn't as clear as it should have been: This refers to the zone state after a "halt" command, not the "poweroff" (the latter seems to just exit with that error message). Sorry for missing to mention that in the first place.

The pgrep output after that "halt" is:

[root@smartos ~]# pargs `pgrep -z 37fbbd20-a490-42de-ac9b-cb46edf1b132`
23069:	/usr/sbin/sshd
argv[0]: sshd: /usr/sbin/sshd [listener] 0 of 10-100 startups

23035:	/usr/sbin/crond -c /etc/crontabs -f
argv[0]: /usr/sbin/crond
argv[1]: -c
argv[2]: /etc/crontabs
argv[3]: -f

22345: zsched

23010:	/sbin/syslogd -t -n
argv[0]: /sbin/syslogd
argv[1]: -t
argv[2]: -n

22644:	init
argv[0]: init

23140:	/sbin/getty 38400 console
argv[0]: /sbin/getty
argv[1]: 38400
argv[2]: console

22650:	ipmgmtd
argv[0]: ipmgmtd
[root@smartos ~]# 

@danmcd
Copy link
Contributor

danmcd commented Jan 14, 2025

So "halt" isn't killing all of the processes. I'll bet if you did pkill -z 37f... your zone would go to shutdown. :(

@peterkelm
Copy link
Author

Tried that - I am kicked out of the LX zone but that's it. The zone is still "running" and I can even zlogin ... back in again.

Your pgrep -z command also gives the same output as above.

@danmcd
Copy link
Contributor

danmcd commented Jan 15, 2025

Hmmm, pkill -9 -z 37f... ?

@peterkelm
Copy link
Author

peterkelm commented Jan 15, 2025

Same result as before. The "-9" option doesn't seem to have an effect here.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants