10 Aug 2021

System Management

systemd and containers and chaos

We were fated to pretend

Users

Operating system users and groups are a tool to minimize access. This protects me from myself. I try to make them as granular as possible.

useradd -m dojo
passwd dojo

There is a concept of a “system” user which has no login shell. The only purpose is to run daemons. This actually fits most of my use cases, but I like being able to hop in to a user on the shell, so rarely use it.

Environment Variables

There are a lot of ways to set environment variables: shell configs, PAM, systemd

Per-user is where it gets tricky.

I have used PAM before, but according to docs its on its way out.

Systemd is probably the safest bet, in ~/.config/environment.d/*.conf, but these are only loaded for services. Not shell programs (unless they are somehow service based).

Note that some managers query the systemd user instance for the exported environment and inject this configuration into programs they start, using systemctl show-environment or the underlying D-Bus call.

Systemd has a little tool to export vars

export $(/usr/lib/systemd/user-environment-generators/30-systemd-environment-d-generator)

Systemd

I have tried a lot of tools to manage the ever growing chaos of my homeservers. Something like ansible still feels like too much overhead for me. I have settled on just manually creating system users (e.g bitcoin for running a full node) and keeping the complexity of the app localized as much as possible (e.g. building an executable in the home directory). This utilizes the OS’s builtin user/group permissions model and makes it pretty easy to follow the principle of least privilege (e.g. only services that need access to Tor have access).

I use systemd to automate starting/stopping/triggering services and managing the service dependencies.

Editing provided units

Sometimes I need to make a change to a unit, but don’t want it to be blown away by pacman on the next update. Systemd has a tool to do just this.

Use systemd-delta to keep track of changes on system.

systemctl edit <unit>

Timers

Systemd can trigger services like cron. Easiest way to do this is create a .timer file for a service and then enable/start the timer. Do not enable the service, that confuses systemd.

report.service

[Unit]
Description=Run BOS node report

[Service]
User=lightning
Group=lightning
ExecStart=/home/lightning/balanceofsatoshis/daily-report.sh

report.timer

[Unit]
Description=Run report daily

[Timer]
OnCalendar=daily

[Install]
WantedBy=timers.target
sudo systemctl enable report.timer
sudo systemctl start report.timer

Path triggers

Another helpful trigger is when a path changes. Think backing up a file everytime it is modified. Like timers, only enable the .path and not the service itself.

backup.path

[Unit]
Description=Backup LND channels on any changes

[Path]
PathModified=/data/lnd/backup/channel.backup

[Install]
WantedBy=multi-user.target
sudo systemctl enable report.path
sudo systemctl start report.path

Sandboxing

Turns out the standard systemd examples are not very secure. Systemd provides a tool to see which services are in trouble: systemd-analyze security and then take a deeper dive per-service with systemd-analyze security <service>.

My standard set of hardening flags (which I’ll try to expand as I learn more about them):

# Hardening
PrivateTmp=true
PrivateDevices=true
ProtectSystem=strict
NoNewPrivileges=true

I think RuntimeDirectory could be used to auto create and destroy a directory for a process, but requires coordination with the process to write/read to that location.

Notifications

I like to be emailed on service failures.

First requirement is an easy to call email script. I am jacking this straight from the Arch wiki with some slight mods for my email setup.

/usr/local/bin/systemd-email

#!/bin/sh
#
# Send alert to my email

/usr/bin/msmtp nick@yonson.dev <<ERRMAIL
Subject: $1 failure
Content-Transfer-Encoding: 8bit
Content-Type: text/plain; charset=UTF-8

$(systemctl status --full "$1")
ERRMAIL

/etc/systemd/system/status-email@.service

[Unit]
Description=status email for %i

[Service]
Type=oneshot
ExecStart=/usr/local/bin/systemd-email %i
User=nobody
Group=systemd-journal

A template unit service with limited permissions to fire the email. Edit the service you want emails for and add OnFailure=status-email@%n.service to the [Unit] (not [Service])section. %n passes the unit’s name to the template.

OnFailure= A space-separated list of one or more units that are activated when this unit enters the “failed” state. A service unit using Restart= enters the failed state only after the start limits are reached.

If using Restart=, a service will only enter a failed state if StartLimitIntervalSec= and StartLimitBurst= are set in the [Service] section.

Units which are started more than burst times within an interval time span are not permitted to start any more.

Do not have to enable the template service (e.g. status-email@lnd).

Podman

Impedance mismatch

Systemd’s model works great for executables which follow the standard fork/exec process.

A growing portion of the services I am running do not publish an executable, but rather a container. The docker client/server (docker daemon) model takes on some of the same responsibilites as systemd (e.g. restarting a service when it fails). There is not a clear line in the sand.

Since I haven’t gone 100% containers for everything though, I need to declare all the service dependencies in systemd (including containerized services). This works just OK. The main shortcoming is that by default docker client commands are fire-and-forget. This sucks because no info is passed back to systemd, it doesn’t know if the service actualy started up correctly and can’t pass that along to dependencies.

Docker commands must always be run by root (or a user in the docker group, which is pretty much the same thing from a security perspective) so we can’t utilize systemd’s ability to execute services as a system user (e.g. bitcoin).

// example systemd service file wrapping docker
[Unit]
Description=Matrix go neb bot
Requires=docker.service
After=docker.service

// docker's `--attach` option forwards signals and stdout/stderr helping pass some info back to systemd
[Service]
ExecStart=docker start -a 80a975d2f9baff82a27edc389bfe2f5a74e597560acc63fb3dfe4a3df07c8797
ExecStop=docker stop 80a975d2f9baff82a27edc389bfe2f5a74e597560acc63fb3dfe4a3df07c8797

[Install]
WantedBy=multi-user.target

Lastly, there is a lot of complexity around user and group permissions between the host and containers. This is most apparent when sharing data between a host and container through a bind mount. in a perfect world from a permissions standpoint, the host’s user and groups would be mirrored into the container and respected there. However, most containers default to running as UID 0 a.k.a. root (note: there is a difference between the uid who starts the container and the uid which interannly runs the container, but they can be the same).

Here is where the complexity jumps again: enter user namespaces. User namespaces are the future fix to help the permissions complexity, but you have to either go all in on them or follow the old best practices, they don’t really play nice together.

User and groups old best practices

I have never gone “all in” on containers, but I think this is the decision tree for best practices:

if (all in on containers) {
    use *volumes* which are fully managed by container framework and hide complexities
} else if (mixing host and containers) {
    use bind mounts and manually ensure UID/GID match between host and containers
} else {
    // no state shared between host and container
    don't even worry bout this stuff
}

Case 2 can be complicated by containers which specify a USER in their Dockerfile. On one hand, this is safer than running as root by default. On the other, this makes them less portable since all downstream systems will have to match this UID in order to work with bind mounts.

I am attempting to bridge these mismatches by switching from docker to podman.

yonson