Skip to content

Change systemd-analyze security from "UNSAFE 😨" to "OK πŸ™‚"!

Notifications You must be signed in to change notification settings

cyberitsolutions/prisonpc-systemd-lockdown

Repository files navigation

Overview

Gotchas

  • Systemd confinement applies to the entire cgroup. If msmtpd.service fork+exec's /bin/cat, there's no way to say "run cat unconfined", like apparmor does here: https://salsa.debian.org/kolter/msmtp/-/blob/debian/1.8.23-1/debian/apparmor/usr.bin.msmtp#L81

  • Lots of things settings implicitly turn on NoNewPrivileges=yes, which breaks a lot of things (e.g. no setgid, so maildrop(8postfix) breaks).

  • If a daemon calls /usr/sbin/sendmail, it's hard to harden it unless you know whether that's provided by msmtp-mta, postfix, exim, sendmail, or what. Because all of those need different privileges. See https://github.com/cyberitsolutions/prisonpc-systemd-lockdown/tree/main/systemd/system/0-EXAMPLES/ for discussion.

  • If a daemon runs arbitrary hooks (e.g. smartd and zfs-zed), those hooks could theoretically do anything, so it's hard to know what can reasonably be hardened.

  • If a daemon uses libnss (i.e. basically all of them), the user might have installed a libnss-foo package that needs any arbitrary thing.

    Question: could libnss-nis provide a systemd generator that automatically adds IPAddressAllow=<the NIS server(s)> to every unit? (Thanks to Mithrandir for the idea.)

    Question: could NIS / NIS+ users be told "sssd supports NIS/NIS+, you must use that in Debian 14+"?

  • If a daemon uses libpam, the user might have installed a libpam-foo package that needs any arbitrary thing.

    Question: is there a way to know which /etc/pam.d/* files a given frobozzd.service unit will use, and therefore could libpam-foo.so add its "I need X, Y, and Z" rules to only those .service units?

    Pathological example: libvirtd.service runs a VM in qemu, and qemu runs smbd because of the following in my-cool-vm.xml. How would the /etc/pam.d/samba package know to add "allow X" rules to libvirtd.service:

    <qemu:commandline>
      <qemu:arg value="-net" />
      <qemu:arg value="-user,smb=/opt/share" />
    </qemu:commandline>
    
  • Any dlopen()-based plugin framework (e.g. gstreamer's) is the same class of problem as nss/pam, though the scope is narrower. e.g. the number of packages providing gstreamer plugins is pretty small.

  • If a daemon manages a bunch of different worker processes (like postfix and dovecot), you can't write separate confinment policies for each worker. You have to write a policy that allows everything every worker needs to do. For example postfix includes a postgres client https://www.postfix.org/pgsql_table.5.html which might need to talk to an arbitrary IP address, even if the actual SMTP only needs IPAddressAllow=mail.example.com.

    nfs-utils 1.4+ is an example where the individual workers are managed by systemd as separate units, liberally using .target and PartOf= and Alias= to help keep the end user sane.

  • https://www.freedesktop.org/software/systemd/man/systemd.exec.html#SystemCallArchitectures= and https://manpages.debian.org/bookworm/dpkg/dpkg.1.en.html#add are inherently at odds.

    If nginx.service has SystemCallArchitectures=native and you apt install nginx:i386 on an amd64 system, it won't start.

    In testing, I found that savelog from binutils:i386 worked fine, though, so maybe at least some simple programs can get away with this?

    UPDATE: that was because savelog is a shell script, and /bin/sh was still amd64.

    roehling observes:

    If your threat model allows access to qemu-user-static for an attacker, they can run pretty much any binary is if it were native, and the whole SystemCallArchitectures hardening becomes meaningless.

    mjg59 observes:

    My understanding of the threat is that compatibility syscalls (eg, x32 on amd64) are less well-tested than the local architecture syscalls, and so allowing apps to call them increases the risk - a compromised app that can make compatibility syscalls stands a higher probability of being able to elevate privileges, either in userland or to the kernel itself. Allowing qemu to translate syscalls from other architectures to the local syscall ABI doesn't increase that risk, so isn't a concern. The goal isn't to prevent code form other architectures from running, it's to reduce the attack surface by preventing calls to the compatbility syscalls.

  • Question: how do we make it as obvious as possible when a daemon crashes due to a deny foo rule? For example, when I tested nginx:i386 with SystemCallArchitectures=native, there was NO indication in journalctl or coredumpctl that it failed because it was i386. If I had just installed nginx:i386 and the Debian maintainer had put SystemCallArchitectures=native there, how the fuck would I have known that was the cause of the problem?

  • Question: if we harden frobozzd.service by default in a way that breaks things for people who do X, how do we get a feel for how many people do X? e.g. in the SystemCallArchitectures=native case, how many people do dpkg --add-architecture? Versus how many people will be protected by that, since it prevents i386 rootkits from executing (i.e. how many people were hit by i386 malware?).

History

My original plan (circa 2018) was that this would be a simple Architecture: all package, and you could simply do apt install more-security to lock down any packages you happened to have installed.

This was modelled after apparmor-profiles and the idea was it'd let me lock down a LOT of stuff without having to have a separate argument "is this really necessary?" with each daemon's developer and/or package maintainer.

Since then (2020-2023), I

  1. moved all my in-production systemd hardening rules into https://git.cyber.com.au/cyber-ansible -- which is private, sorry. I also have a small amount buried in https://github.com/cyberitsolutions/bootstrap2020/
  2. started hassling individual upstreams:

So this repo kinda stopped getting updates.

Since then (2023), Russell Coker independently proposed a general hardening cleanup:

Adding more hardening

What to harden (prioritization)

How to harden

Once you've done 2-5 daemons, you get a "feel" for the trouble spots. Total time to harden a unit from EXPOSURE=10 to EXPOSURE=3 usually takes me 1-4 hours. If I've used the daemon before & know its config format & source code, usually 1 hour.

I typically start with a "deny all" ruleset. Either I copy-paste from another daemon I did earlier, or I copy-paste from systemd-analyze security. A slightly out-of-date one is https://github.com/cyberitsolutions/prisonpc-systemd-lockdown/blob/main/systemd/system/0-EXAMPLES/20-default-deny.conf

Usually the daemon segfaults immediately. In coredumpctl I see what the last syscall was. Typically it is setuid so per I know to allowlist these:

SystemCallFilter=@setuid
CapabilityBoundingSet=CAP_SETUID CAP_SETGID

This is because the daemon does a no-op setuid(123) even if it's ALREADY 123 (due to User=%p in frobozzd.service). This could be patched away, but so far my policy has been "focus on stuff that doesn't require patching", so instead I just allow that syscall.

It is very common to need both AF_UNIX and AF_NETLINK, so I don't even try to block those. Things that need network (e.g. postfix, nginx) would also need AF_INET, AF_INET6, IPAddressAllow=all, &c.

The next most common failure is being unable to write to somewhere due to ProtectSystem=strict, so I look for things like /run/frobozzd.pid or /var/lib/frobozzd/state.db in the error logs (journalctl -u frobozzd). If systemd's existing things like RuntimeDirectory=%p aren't enough to cover it, I add ReadWritePaths=, or downgrade ProtectSystem=strict to ProtectSystem=yes.

If it's still crashing, I remove SystemCallFilter=~@privileged @resources and CapabilityBoundingSet= entirely. If that works, I strace or bisect to find which syscalls must be allowlisted.

If it's STILL crashing, I bisect over the entire hardening denylist. (Comment out half. Does it work now? If so, it's mad about the commented-out half. Repeat.)

The hardest part is the rare case where a daemon will automatically detect that an action failed, then silently switch to a less-secure mode. It is very hard to spot this is happening until after the hardened unit has been in production for a month or two.

PS: I typically have a dev loop like:

journalctl -fu frobozzd &

while ! systemctl restart frobozzd;
do systemctl edit frobozzd; done

Or if it's on another host:

M-! <hardening.conf ssh root@test '
    cat >/etc/systemd/system/frobozzd.service.d/hardening.conf;
    systemctl daemon-reload;
    systemctl restart frobozzd;
    systemctl status frobozzd'

PPS: so far I've been talking about system units, but user units can also have hardening!

For example, I bet this only needs write access to /sys/blah/rfkill, and could have it's TCP privileges revoked:

org.gnome.SettingsDaemon.Rfkill.service 9.8 UNSAFE 😨

Also by default systemd-analyze security doesn't mention timer/path-fired units like e2scrub or fsck. If you want to see those you have to do something like systemctl list-units --all --type=service.

Adding a lintian hook

I worked out to invoke it in offline mode (for lintian) you do this:

systemd-analyze --offline=yes ./path/to/foo.service

I didn't understand (from the manpage) that I could pass a file instead of a unit name, so I wasted a lot of time trying to make a minimal --root=tmpdir work. Also it won't accept "./debian/service", nor a symlink to same.

Suggestions for upstreams

About

Change systemd-analyze security from "UNSAFE 😨" to "OK πŸ™‚"!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages