Systemd Services are frustratingly hard to make safe
Today's story starts with me wanting to reduce the permissions of some of the systemd services I run. The two services I set my sight on were opensnitch, a system service, and protonmail-bridge, a user service.
Opensnitch
Conveniently opensnitch provides a working systemd unit:
[Unit]
Description=OpenSnitch is a GNU/Linux port of the Little Snitch application firewall.
Documentation=https://github.com/gustavo-iniguez-goya/opensnitch/wiki
Wants=network.target
After=network.target
[Service]
Type=simple
PermissionsStartOnly=true
ExecStartPre=/bin/mkdir -p /etc/opensnitchd/rules
ExecStart=/usr/local/bin/opensnitchd -rules-path /etc/opensnitchd/rules
Restart=always
RestartSec=30
[Install]
WantedBy=multi-user.target
Let's break down what this unit does:
Description
andDocumentation
are quite self explanatory and don't affect the runtime of the unit.Wants=network.target
means that this service may only be started after the networking layer has been initialized.After=network.target
means that this service should automatically be started once the networking layer has been initialized.Type=simple
specifies how the unit launches.simple
is often the right choice, but processes that daemonize itself for example will beforking
.PermissionsStartOnly=true
is a deprecated attribute that removed some permissions fromExecStartPre
, but notably not fromExecStart
.ExecStartPre=...
is run before the systemd unit starts up. In this case it ensures that a required directory exists.ExecStart=...
should be the command to run the systemd unit itself.Restart=always
will cause the process inExecStart
to be launched again if it ever exits.RestartSec=30
specifies how long to wait before relaunching.- Finally
WantedBy=multi-user.target
is related to how the systemd unit is initially installed.
As far as I know, this is a very typical systemd unit. However, especially a unit that handles network activity we want to limit the damage it could cause if it is compromised in any way. For example, a buffer overflow within opensnitch may allow the attacker to execute arbitrary command in the process of the systemd unit.
My first attempt to tackle this was to look at the official documentation. The density of the documentation makes it quite challenging to understand best practices. Instead I turned to systemd-analyze security. This command spits out a long list of things that can be improved:
NAME DESCRIPTION EXPOSURE
✗ PrivateNetwork= Service has access to the host's network 0.5
✗ User=/DynamicUser= Service runs as root user 0.4
✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP) Service may change UID/GID identities/capabilities 0.3
✗ CapabilityBoundingSet=~CAP_SYS_ADMIN Service has administrator privileges 0.3
✗ CapabilityBoundingSet=~CAP_SYS_PTRACE Service has ptrace() debugging abilities 0.3
✗ RestrictAddressFamilies=~AF_(INET|INET6) Service may allocate Internet sockets 0.3
✗ RestrictNamespaces=~CLONE_NEWUSER Service may create user namespaces 0.3
✗ RestrictAddressFamilies=~… Service may allocate exotic sockets 0.3
✗ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP) Service may change file ownership/access mode/capabilities unrestricted 0.2
✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER) Service may override UNIX file/IPC permission checks 0.2
✗ CapabilityBoundingSet=~CAP_NET_ADMIN Service has network configuration privileges 0.2
✗ CapabilityBoundingSet=~CAP_RAWIO Service has raw I/O access 0.2
✗ CapabilityBoundingSet=~CAP_SYS_MODULE Service may load kernel modules 0.2
✗ CapabilityBoundingSet=~CAP_SYS_TIME Service processes may change the system clock 0.2
✗ DeviceAllow= Service has no device ACL 0.2
✗ IPAddressDeny= Service does not define an IP address whitelist 0.2
✓ KeyringMode= Service doesn't share key material with other services
✗ NoNewPrivileges= Service processes may acquire new privileges 0.2
✓ NotifyAccess= Service child processes cannot alter service state
✗ PrivateDevices= Service potentially has access to hardware devices 0.2
✗ PrivateMounts= Service may install system mounts 0.2
✗ PrivateTmp= Service has access to other software's temporary files 0.2
✗ PrivateUsers= Service has access to other users 0.2
✗ ProtectControlGroups= Service may modify to the control group file system 0.2
✗ ProtectHome= Service has full access to home directories 0.2
✗ ProtectKernelModules= Service may load or read kernel modules 0.2
✗ ProtectKernelTunables= Service may alter kernel tunables 0.2
✗ ProtectSystem= Service has full access to the OS file hierarchy 0.2
✗ RestrictAddressFamilies=~AF_PACKET Service may allocate packet sockets 0.2
✗ SystemCallArchitectures= Service may execute system calls with all ABIs 0.2
✗ SystemCallFilter=~@clock Service does not filter system calls 0.2
✗ SystemCallFilter=~@debug Service does not filter system calls 0.2
✗ SystemCallFilter=~@module Service does not filter system calls 0.2
✗ SystemCallFilter=~@mount Service does not filter system calls 0.2
✗ SystemCallFilter=~@raw-io Service does not filter system calls 0.2
✗ SystemCallFilter=~@reboot Service does not filter system calls 0.2
✗ SystemCallFilter=~@swap Service does not filter system calls 0.2
✗ SystemCallFilter=~@privileged Service does not filter system calls 0.2
✗ SystemCallFilter=~@resources Service does not filter system calls 0.2
✓ AmbientCapabilities= Service process does not receive ambient capabilities
✗ CapabilityBoundingSet=~CAP_AUDIT_* Service has audit subsystem access 0.1
✗ CapabilityBoundingSet=~CAP_KILL Service may send UNIX signals to arbitrary processes 0.1
✗ CapabilityBoundingSet=~CAP_MKNOD Service may create device nodes 0.1
✗ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has elevated networking privileges 0.1
✗ CapabilityBoundingSet=~CAP_SYSLOG Service has access to kernel logging 0.1
✗ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE) Service has privileges to change resource use parameters 0.1
✗ RestrictNamespaces=~CLONE_NEWCGROUP Service may create cgroup namespaces 0.1
✗ RestrictNamespaces=~CLONE_NEWIPC Service may create IPC namespaces 0.1
✗ RestrictNamespaces=~CLONE_NEWNET Service may create network namespaces 0.1
✗ RestrictNamespaces=~CLONE_NEWNS Service may create file system namespaces 0.1
✗ RestrictNamespaces=~CLONE_NEWPID Service may create process namespaces 0.1
✗ RestrictRealtime= Service may acquire realtime scheduling 0.1
✗ SystemCallFilter=~@cpu-emulation Service does not filter system calls 0.1
✗ SystemCallFilter=~@obsolete Service does not filter system calls 0.1
✗ RestrictAddressFamilies=~AF_NETLINK Service may allocate netlink sockets 0.1
✗ RootDirectory=/RootImage= Service runs within the host's root directory 0.1
SupplementaryGroups= Service runs as root, option does not matter
✗ CapabilityBoundingSet=~CAP_MAC_* Service may adjust SMACK MAC 0.1
✗ CapabilityBoundingSet=~CAP_SYS_BOOT Service may issue reboot() 0.1
✓ Delegate= Service does not maintain its own delegated control group subtree
✗ LockPersonality= Service may change ABI personality 0.1
✗ MemoryDenyWriteExecute= Service may create writable executable memory mappings 0.1
RemoveIPC= Service runs as root, option does not apply
✗ RestrictNamespaces=~CLONE_NEWUTS Service may create hostname namespaces 0.1
✗ UMask= Files created by service are world-readable by default 0.1
✗ CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE Service may mark files immutable 0.1
✗ CapabilityBoundingSet=~CAP_IPC_LOCK Service may lock memory into RAM 0.1
✗ CapabilityBoundingSet=~CAP_SYS_CHROOT Service may issue chroot() 0.1
✗ CapabilityBoundingSet=~CAP_BLOCK_SUSPEND Service may establish wake locks 0.1
✗ CapabilityBoundingSet=~CAP_LEASE Service may create file leases 0.1
✗ CapabilityBoundingSet=~CAP_SYS_PACCT Service may use acct() 0.1
✗ CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG Service may issue vhangup() 0.1
✗ CapabilityBoundingSet=~CAP_WAKE_ALARM Service may program timers that wake up the system 0.1
✗ RestrictAddressFamilies=~AF_UNIX Service may allocate local sockets 0.1
→ Overall exposure level for opensnitch.service: 9.5 UNSAFE 😨
This list provides some indication of settings that can be improved to lower the risk the service poses to the rest of the system. Of course we still need to evaluate every rule qualitatively to understand the impact it will have on our application. opensnitch
in particular proved itself to be finicky, since it (1) is an application firewall, and (2) it does not always fail hard (requests just hang for some systemd settings).
The official documentation does a better job at explaining the individual options, but I would still like to highlight some options that I consider particular important:
DynamicUser
orUser
allows you to pick a user other than root to run the systemd unit. In my personal opinion, this option is the most important to reduce access to the host system (when used in conjunction withNoNewPrivileges
).PrivateNetwork
,IPAddressDeny
andIPAddressAllow
allow you to reduce the network access for a systemd unit. In many cases, this is the best way to prevent a compromised systemd unit to upload data to an external system.CapabilityBoundingSet
combined withAmbientCapabilities
will allow you to give the systemd unit some limited superuser capabilities. This came in handy foropensnitch
, since it leverages quite a bit of kernel functionality.
Unfortunately the last point was a significant point of frustration for opensnitch
, since it was hard to determine what capabilities are needed. The tools I found online did not seem to solve my problems. getpcaps printed the capaibilites that a process had, not the ones that it was using. I was unable to run capable successfully. And strace
did not log the used capabilities. In the end I ended up individually removing capabilities to determine which ones were required.
Protonmail
For protonmail
I went back to systemd-analyze security --user
to find what settings can be improved. Unfortunately almost every single setting caused the systemd service to fail to start with an error message:
Failed to drop capabilities: Operation not permitted
Failed at step CAPABILITIES spawning /you-binary-path: Operation not permitted
Even limiting the set of possible capabilites, such as with CapabilityBoundingSet
did not work. After a lot of head scratching, this github issue explained that the user does not have the ability to reduce its own permissions. Frustratingly, setting options such as ProtectProc
is not possible to apply, since it implicitly affects the capabilities.
Conclusion
While I outline weaknesses of systemd
and opensnitch
, I am eternally grateful to the maintainers for building those tools. I hope that there will be more of an investment into this in the future. Unfortunately, even on a new debian install the systemd services are mostly considerd unsafe:
$ systemd-analyze security
UNIT EXPOSURE PREDICATE HAPPY
cron.service 9.5 UNSAFE 😨
dbus.service 9.5 UNSAFE 😨
emergency.service 9.5 UNSAFE 😨
getty@tty1.service 9.5 UNSAFE 😨
ifup@eth0.service 9.5 UNSAFE 😨
opensnitch.service 9.5 UNSAFE 😨
rc-local.service 9.5 UNSAFE 😨
rescue.service 9.5 UNSAFE 😨
rsync.service 9.5 UNSAFE 😨
rsyslog.service 9.5 UNSAFE 😨
ssh.service 9.5 UNSAFE 😨
systemd-ask-password-console.service 9.3 UNSAFE 😨
systemd-ask-password-wall.service 9.3 UNSAFE 😨
systemd-fsckd.service 9.5 UNSAFE 😨
systemd-initctl.service 9.3 UNSAFE 😨
systemd-journald.service 4.3 OK 🙂
systemd-logind.service 4.1 OK 🙂
systemd-networkd.service 2.8 OK 🙂
systemd-timesyncd.service 2.0 OK 🙂
systemd-udevd.service 8.3 EXPOSED 🙁
user@1000.service 9.1 UNSAFE 😨
With that in mind, I don't see this situation changing anytime soon. Instead the best hope may be that security conscious developers contribute their systemd unit files to upstream repositories. I'll update this post with a PR to improve the opensnitch
service unit soon.
Published on 2021-12-26.