Yet another SWE Blog.

Systemd Services are frustratingly hard to make safe

Nigel Schuster
Nigel Schuster

Today's story starts with me wanting to reduce the permissions of some of the systemd services I run. The two services I set my sight on were opensnitch, a system service, and protonmail-bridge, a user service.

Opensnitch

Conveniently opensnitch provides a working systemd unit:

[Unit]
Description=OpenSnitch is a GNU/Linux port of the Little Snitch application firewall.
Documentation=https://github.com/gustavo-iniguez-goya/opensnitch/wiki
Wants=network.target
After=network.target

[Service]
Type=simple
PermissionsStartOnly=true
ExecStartPre=/bin/mkdir -p /etc/opensnitchd/rules
ExecStart=/usr/local/bin/opensnitchd -rules-path /etc/opensnitchd/rules
Restart=always
RestartSec=30

[Install]
WantedBy=multi-user.target

Let's break down what this unit does:

  • Description and Documentation are quite self explanatory and don't affect the runtime of the unit.
  • Wants=network.target means that this service may only be started after the networking layer has been initialized.
  • After=network.target means that this service should automatically be started once the networking layer has been initialized.
  • Type=simple specifies how the unit launches. simple is often the right choice, but processes that daemonize itself for example will be forking.
  • PermissionsStartOnly=true is a deprecated attribute that removed some permissions from ExecStartPre, but notably not from ExecStart.
  • ExecStartPre=... is run before the systemd unit starts up. In this case it ensures that a required directory exists.
  • ExecStart=... should be the command to run the systemd unit itself.
  • Restart=always will cause the process in ExecStart to be launched again if it ever exits. RestartSec=30 specifies how long to wait before relaunching.
  • Finally WantedBy=multi-user.target is related to how the systemd unit is initially installed.

As far as I know, this is a very typical systemd unit. However, especially a unit that handles network activity we want to limit the damage it could cause if it is compromised in any way. For example, a buffer overflow within opensnitch may allow the attacker to execute arbitrary command in the process of the systemd unit.

My first attempt to tackle this was to look at the official documentation. The density of the documentation makes it quite challenging to understand best practices. Instead I turned to systemd-analyze security. This command spits out a long list of things that can be improved:

  NAME                                                        DESCRIPTION                                                             EXPOSURE
✗ PrivateNetwork=                                             Service has access to the host's network                                     0.5
✗ User=/DynamicUser=                                          Service runs as root user                                                    0.4
✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP)                Service may change UID/GID identities/capabilities                           0.3
✗ CapabilityBoundingSet=~CAP_SYS_ADMIN                        Service has administrator privileges                                         0.3
✗ CapabilityBoundingSet=~CAP_SYS_PTRACE                       Service has ptrace() debugging abilities                                     0.3
✗ RestrictAddressFamilies=~AF_(INET|INET6)                    Service may allocate Internet sockets                                        0.3
✗ RestrictNamespaces=~CLONE_NEWUSER                           Service may create user namespaces                                           0.3
✗ RestrictAddressFamilies=~…                                  Service may allocate exotic sockets                                          0.3
✗ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP)           Service may change file ownership/access mode/capabilities unrestricted      0.2
✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)         Service may override UNIX file/IPC permission checks                         0.2
✗ CapabilityBoundingSet=~CAP_NET_ADMIN                        Service has network configuration privileges                                 0.2
✗ CapabilityBoundingSet=~CAP_RAWIO                            Service has raw I/O access                                                   0.2
✗ CapabilityBoundingSet=~CAP_SYS_MODULE                       Service may load kernel modules                                              0.2
✗ CapabilityBoundingSet=~CAP_SYS_TIME                         Service processes may change the system clock                                0.2
✗ DeviceAllow=                                                Service has no device ACL                                                    0.2
✗ IPAddressDeny=                                              Service does not define an IP address whitelist                              0.2
✓ KeyringMode=                                                Service doesn't share key material with other services
✗ NoNewPrivileges=                                            Service processes may acquire new privileges                                 0.2NotifyAccess=                                               Service child processes cannot alter service state
✗ PrivateDevices=                                             Service potentially has access to hardware devices                           0.2PrivateMounts=                                              Service may install system mounts                                            0.2PrivateTmp=                                                 Service has access to other software's temporary files                       0.2
✗ PrivateUsers=                                               Service has access to other users                                            0.2
✗ ProtectControlGroups=                                       Service may modify to the control group file system                          0.2
✗ ProtectHome=                                                Service has full access to home directories                                  0.2
✗ ProtectKernelModules=                                       Service may load or read kernel modules                                      0.2
✗ ProtectKernelTunables=                                      Service may alter kernel tunables                                            0.2
✗ ProtectSystem=                                              Service has full access to the OS file hierarchy                             0.2
✗ RestrictAddressFamilies=~AF_PACKET                          Service may allocate packet sockets                                          0.2
✗ SystemCallArchitectures=                                    Service may execute system calls with all ABIs                               0.2
✗ SystemCallFilter=~@clock                                    Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@debug                                    Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@module                                   Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@mount                                    Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@raw-io                                   Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@reboot                                   Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@swap                                     Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@privileged                               Service does not filter system calls                                         0.2
✗ SystemCallFilter=~@resources                                Service does not filter system calls                                         0.2
✓ AmbientCapabilities=                                        Service process does not receive ambient capabilities
✗ CapabilityBoundingSet=~CAP_AUDIT_*                          Service has audit subsystem access                                           0.1
✗ CapabilityBoundingSet=~CAP_KILL                             Service may send UNIX signals to arbitrary processes                         0.1
✗ CapabilityBoundingSet=~CAP_MKNOD                            Service may create device nodes                                              0.1
✗ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has elevated networking privileges                                   0.1
✗ CapabilityBoundingSet=~CAP_SYSLOG                           Service has access to kernel logging                                         0.1
✗ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE)              Service has privileges to change resource use parameters                     0.1
✗ RestrictNamespaces=~CLONE_NEWCGROUP                         Service may create cgroup namespaces                                         0.1
✗ RestrictNamespaces=~CLONE_NEWIPC                            Service may create IPC namespaces                                            0.1
✗ RestrictNamespaces=~CLONE_NEWNET                            Service may create network namespaces                                        0.1
✗ RestrictNamespaces=~CLONE_NEWNS                             Service may create file system namespaces                                    0.1
✗ RestrictNamespaces=~CLONE_NEWPID                            Service may create process namespaces                                        0.1
✗ RestrictRealtime=                                           Service may acquire realtime scheduling                                      0.1
✗ SystemCallFilter=~@cpu-emulation                            Service does not filter system calls                                         0.1
✗ SystemCallFilter=~@obsolete                                 Service does not filter system calls                                         0.1
✗ RestrictAddressFamilies=~AF_NETLINK                         Service may allocate netlink sockets                                         0.1
✗ RootDirectory=/RootImage=                                   Service runs within the host's root directory                                0.1
  SupplementaryGroups=                                        Service runs as root, option does not matter
✗ CapabilityBoundingSet=~CAP_MAC_*                            Service may adjust SMACK MAC                                                 0.1CapabilityBoundingSet=~CAP_SYS_BOOT                         Service may issue reboot()                                                   0.1Delegate=                                                   Service does not maintain its own delegated control group subtree
✗ LockPersonality=                                            Service may change ABI personality                                           0.1MemoryDenyWriteExecute=                                     Service may create writable executable memory mappings                       0.1
  RemoveIPC=                                                  Service runs as root, option does not apply
✗ RestrictNamespaces=~CLONE_NEWUTS                            Service may create hostname namespaces                                       0.1UMask=                                                      Files created by service are world-readable by default                       0.1CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE                  Service may mark files immutable                                             0.1CapabilityBoundingSet=~CAP_IPC_LOCK                         Service may lock memory into RAM                                             0.1CapabilityBoundingSet=~CAP_SYS_CHROOT                       Service may issue chroot()                                                   0.1CapabilityBoundingSet=~CAP_BLOCK_SUSPEND                    Service may establish wake locks                                             0.1CapabilityBoundingSet=~CAP_LEASE                            Service may create file leases                                               0.1CapabilityBoundingSet=~CAP_SYS_PACCT                        Service may use acct()                                                       0.1CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG                   Service may issue vhangup()                                                  0.1CapabilityBoundingSet=~CAP_WAKE_ALARM                       Service may program timers that wake up the system                           0.1RestrictAddressFamilies=~AF_UNIX                            Service may allocate local sockets                                           0.1

→ Overall exposure level for opensnitch.service: 9.5 UNSAFE 😨

This list provides some indication of settings that can be improved to lower the risk the service poses to the rest of the system. Of course we still need to evaluate every rule qualitatively to understand the impact it will have on our application. opensnitch in particular proved itself to be finicky, since it (1) is an application firewall, and (2) it does not always fail hard (requests just hang for some systemd settings).

The official documentation does a better job at explaining the individual options, but I would still like to highlight some options that I consider particular important:

  • DynamicUser or User allows you to pick a user other than root to run the systemd unit. In my personal opinion, this option is the most important to reduce access to the host system (when used in conjunction with NoNewPrivileges).
  • PrivateNetwork, IPAddressDeny and IPAddressAllow allow you to reduce the network access for a systemd unit. In many cases, this is the best way to prevent a compromised systemd unit to upload data to an external system.
  • CapabilityBoundingSet combined with AmbientCapabilities will allow you to give the systemd unit some limited superuser capabilities. This came in handy for opensnitch, since it leverages quite a bit of kernel functionality.

Unfortunately the last point was a significant point of frustration for opensnitch, since it was hard to determine what capabilities are needed. The tools I found online did not seem to solve my problems. getpcaps printed the capaibilites that a process had, not the ones that it was using. I was unable to run capable successfully. And strace did not log the used capabilities. In the end I ended up individually removing capabilities to determine which ones were required.

Protonmail

For protonmail I went back to systemd-analyze security --user to find what settings can be improved. Unfortunately almost every single setting caused the systemd service to fail to start with an error message:

Failed to drop capabilities: Operation not permitted
Failed at step CAPABILITIES spawning /you-binary-path: Operation not permitted

Even limiting the set of possible capabilites, such as with CapabilityBoundingSet did not work. After a lot of head scratching, this github issue explained that the user does not have the ability to reduce its own permissions. Frustratingly, setting options such as ProtectProc is not possible to apply, since it implicitly affects the capabilities.

Conclusion

While I outline weaknesses of systemd and opensnitch, I am eternally grateful to the maintainers for building those tools. I hope that there will be more of an investment into this in the future. Unfortunately, even on a new debian install the systemd services are mostly considerd unsafe:

$ systemd-analyze security
UNIT                                 EXPOSURE PREDICATE HAPPY
cron.service                              9.5 UNSAFE    😨
dbus.service                              9.5 UNSAFE    😨
emergency.service                         9.5 UNSAFE    😨
getty@tty1.service                        9.5 UNSAFE    😨
ifup@eth0.service                         9.5 UNSAFE    😨
opensnitch.service                        9.5 UNSAFE    😨
rc-local.service                          9.5 UNSAFE    😨
rescue.service                            9.5 UNSAFE    😨
rsync.service                             9.5 UNSAFE    😨
rsyslog.service                           9.5 UNSAFE    😨
ssh.service                               9.5 UNSAFE    😨
systemd-ask-password-console.service      9.3 UNSAFE    😨
systemd-ask-password-wall.service         9.3 UNSAFE    😨
systemd-fsckd.service                     9.5 UNSAFE    😨
systemd-initctl.service                   9.3 UNSAFE    😨
systemd-journald.service                  4.3 OK        🙂
systemd-logind.service                    4.1 OK        🙂
systemd-networkd.service                  2.8 OK        🙂
systemd-timesyncd.service                 2.0 OK        🙂
systemd-udevd.service                     8.3 EXPOSED   🙁
user@1000.service                         9.1 UNSAFE    😨

With that in mind, I don't see this situation changing anytime soon. Instead the best hope may be that security conscious developers contribute their systemd unit files to upstream repositories. I'll update this post with a PR to improve the opensnitch service unit soon.

Published on 2021-12-26.


More Stories