frood, an Alpine initramfs NAS
2024-12-6 05:26:58 Author: words.filippo.io(查看原文) 阅读量:11 收藏

My NAS, frood, has a bit of a weird setup. It’s just one big initramfs containing a whole Alpine Linux system. It’s delightful and I am not sure why it’s not more common.

  • As long as the bootloader can find the kernel and initramfs, the machine comes up cleanly.
  • A/B deployments and rollbacks are just a matter of choosing a different boot option.
  • The system is defined declaratively in the git repo that builds the initramfs.
  • Importantly to me, it’s not defined in some complex DSL: if I want a file to exist at /etc/example.conf I put it in root/etc/example.conf, and the rest is done by a few hundred lines of scripts I can (and have) read.
  • Configuring it doesn’t look any different than configuring any regular Alpine system.
  • I can test the next deploy with a qemu oneliner.
  • There are very very few moving parts.

If this already sounds appealing, you can skip to the “How it works” section below.

But why

I’ve always liked running systems from memory: it’s fast and prevents wear on the system storage device, which is often some janky SD card, because the good drives are dedicated to the ZFS pool.

However, you immediately have the problem of how to persist configuration changes.

Alpine’s answer to this is “diskless mode” where any customization is kept in an overlay file. After boot, the stock system looks for a file matching *.apkovl in all available filesystems, applies it, and then installs any missing apk packages from a local cache.

The first problem with that is complexity: the tool to generate and manage the apkovl, lbu(1), is pretty good but that process has a lot of moving parts. Find the apkovl, apply it, mount the filesystems in the new fstab, install the missing apks, resume the boot process. Over the past year, I had this break multiple times, either because it couldn’t find the filesystem anymore or because the apks did not get installed. The boot process depends on the package manager!

The second problem is that I would really like the state of the system to be tracked in git. Graham Christensen has a very good pitch for declarative or immutable systems in “Erase your darlings”.

I erase my systems at every boot.

Over time, a system collects state on its root partition. This state lives in assorted directories like /etc and /var, and represents every under-documented or out-of-order step in bringing up the services.

“Right, run myapp-init.”

These small, inconsequential “oh, oops” steps are the pieces that get lost and don’t appear in your runbooks.

“Just download ca-certificates to … to fix …”

Each of these quick fixes leaves you doomed to repeat history in three years when you’re finally doing that dreaded RHEL 7 to RHEL 8 upgrade.

“Oh, touch /etc/ipsec.secrets or the l2tp tunnel won’t work.”

I used to solve that by making (most) changes via Ansible, but then I had a multi-layer situation where I needed to make a change in Ansible, then deploy it, then save it with lbu to the apkovl.

There are of course many alternatives for declarative systems: from NixOS (which just doesn’t sound fun) to gokrazy (which is not quite ready to ship ZFS) to embedded toolchains like buildroot or the newer u-root.

Thing is though, I really like Alpine: a simple, well-packaged, lightweight, GNU-less Linux distribution. What I don’t like are its init and persistence mechanisms.

a screenshot of four texts saying "yeah I think all my objections to Alpine are basically its flaky init and its persistency mechanism" "if I run apk at build time to make a chonky initramfs, write 300 lines to replace init, I might be golden" "all of the mkinitfs complexity and flakyness is in finding the modules, loading them, finding the root, finding the apk cache, installing it" "all of that goes poof”

How it works

When it boots, Linux expects an “initramfs” image. It’s a simple cpio archive of the files that make up the very first root filesystem at boot. Usually the job of this system is to load enough modules to mount the real rootfs and pivot into it. Nothing stops us from putting the entire system in it, though! Who needs a rootfs?

Building an initramfs

The starting point is alpine-make-rootfs, which is a short (~500 lines) script meant to build a container image. It’s really 90% of what we need.

#!/bin/sh
set -e

wget https://raw.githubusercontent.com/alpinelinux/alpine-make-rootfs/v0.7.0/alpine-make-rootfs \
    && echo 'e09b623054d06ea389f3a901fd85e64aa154ab3a  alpine-make-rootfs' | sha1sum -c && \
    chmod +x alpine-make-rootfs

ROOTFS_DEST=$(mktemp -d)

# Stop mkinitfs from running during apk install.
mkdir -p "$ROOTFS_DEST/etc/mkinitfs"
echo "disable_trigger=yes" > "$ROOTFS_DEST/etc/mkinitfs/mkinitfs.conf"

export ALPINE_BRANCH=edge
export SCRIPT_CHROOT=yes
export FS_SKEL_DIR=root
export FS_SKEL_CHOWN=root:root
PACKAGES="$(cat packages)"
export PACKAGES
./alpine-make-rootfs "$ROOTFS_DEST" setup.sh

alpine-make-rootfs will copy the files from the root directory, install the packages from the packages file, and run the setup.sh script in a chroot.

Then, we extract the boot directory and package the rest into an initramfs archive.

cd "$ROOTFS_DEST"
mv boot "$IMAGE_DEST"
find . | cpio -o -H newc | gzip > "$IMAGE_DEST/initramfs-lts"

That’s truly very nearly it! It’s impressive how Alpine lends itself to this with practically no hacks.

Packages

The packages we install are the usual stuff you’d install on a server. Only a few are noteworthy.

  • alpine-base is the metapackage that installs apk, busybox, openrc, and a few config files.
  • linux-lts is the kernel, along with its modules. I considered thinning down the modules to only the ones I needed, but it’s ultimately a lot of hacks just to save a couple hundred MB. Note there is no modloop! The modules are always available.
  • linux-firmware-i915 is the i915 folder of Linux firmware. Need to install at least one package providing linux-firmware-any (including linux-firmware-none) or linux-firmware gets installed, which installs them all.
  • intel-ucode is the microcode update. It installs a file in /boot that can be used as a pre-initramfs. This is in fact easier to set up than on bigger systems.
  • syslinux is the bootloader. Way simpler than GRUB, it installs in the filesystem partition, and then boots the kernel from that partition. This closes the loop: as long as we boot the right partition, there is no way for anything but our system to load. Nothing in the boot process needs to discover or even give a name to a filesystem.
  • openrc-init is the init. Alpine doesn’t actually use OpenRC’s init, it uses the one from busybox, but I found OpenRC’s easier to set up. Note though that it doesn’t work with busybox’s shutdown/reboot/poweroff commands so you need to use openrc-shutdown.
  • agetty if you plan to ever connect a keyboard and screen.

Setup script

The setup.sh script is also nothing special. We just need to link /init, set up the run-levels, and set the root password. (Yes, that’s my actual password hash. No you won’t break it.)

#!/bin/sh
set -e

ln -s /sbin/openrc-init /init

rc-update add devfs sysinit
rc-update add dmesg sysinit

rc-update add hwclock boot
rc-update add modules boot
rc-update add sysctl boot
rc-update add hostname boot
rc-update add bootmisc boot
rc-update add syslog boot
rc-update add klogd boot
rc-update add networking boot
rc-update add seedrng boot

rc-update add mount-ro shutdown
rc-update add killprocs shutdown

ln -s /etc/init.d/agetty /etc/init.d/agetty.ttyS0
ln -s /etc/init.d/agetty /etc/init.d/agetty.tty1

rc-update add agetty.ttyS0 default
rc-update add agetty.tty1 default

rc-update add acpid default
rc-update add crond default
rc-update add local default
rc-update add openntpd default
rc-update add sshd default
rc-update add tailscale default

chpasswd -e <<'EOF'
root:$6$twsDxnP.TG2M8J4l$7lte7E/ImK4UwoursD7qQCC7XMUothIDb9FTH1MncxYbGQDUQPkC/9pxleTwPxEs3nbatApszxuwc4yj6ucdX1
EOF

In practice I set up a few more services here, but they are not needed to run the system. This is just where you declaratively specify how the system is configured.

Root skeleton

The root skeleton is similarly system-specific, and it’s so nice to be able to drop files into the image just by creating them. For example, if I want something to run at boot, I just add a file to root/etc/local.d/.

A few noteworthy files in the skeleton.

#!/bin/sh
openrc-shutdown -p now

root/etc/acpi/PWRF/00000080 makes the power button work with openrc-init.

root/etc/network/interfaces and root/etc/hostname and root/etc/hosts get the network to work.

root/etc/ssh/ssh_host_ed25519_key and root/etc/ssh/ssh_host_ed25519_key.pub and root/root/.ssh/authorized_keys for obvious reasons.

sshd_disable_keygen=yes

root/etc/conf.d/sshd avoids generating non-Ed25519 host keys.

Finally, a bit of persistence for the two things that truly can’t do without it: the RNG seed (arguably not necessary with hardware randomness) and Tailscale (which really doesn’t know how to run without persistence, alas). Rigorously UUID mounted.

UUID=B61B-19E7   /media/usb   vfat   noatime,rw,fmask=177 0 0

root/etc/fstab

seed_dir=/media/usb/persist/seedrng

root/etc/conf.d/seedrng

TAILSCALED_OPTS="-state /media/usb/persist/tailscaled.state"

root/etc/conf.d/tailscale

qemu testing

Here’s something beautiful about this setup: you can meaningfully test it in qemu by just pointing it at the kernel and initramfs. Even works emulated on my arm64 M2.

qemu-system-x86_64 -m 4G -kernel "images/$image/vmlinuz-lts" \
    -initrd "images/$image/initramfs-lts" -append "console=ttyS0" \
    -nographic -device qemu-xhci -device usb-storage,drive=usbstick \
    -drive if=none,id=usbstick,file=usb_disk.img,format=raw

This includes a persistence device that I formatted with the same UUID as the production one.[1] Since Tailscale configuration is in there, the qemu image comes up as a different Tailscale device, and I can SSH into it separately.

Bootloader

Installing or updating the bootloader is done from the system itself with extlinux.

rm -rf /media/usb/boot/syslinux
mkdir -p /media/usb/boot/syslinux

cp /usr/share/syslinux/*.c32 /media/usb/boot/syslinux/

extlinux --install /media/usb/boot/syslinux

cat > /media/usb/boot/syslinux/syslinux.cfg <<EOF
PROMPT 0
DEFAULT lts

LABEL lts
KERNEL /boot/vmlinuz-lts
INITRD /boot/intel-ucode.img,/boot/initramfs-lts

LABEL old
KERNEL /boot/vmlinuz-lts-old
INITRD /boot/intel-ucode.img-old,/boot/initramfs-lts-old

LABEL new
KERNEL /boot/vmlinuz-lts-new
INITRD /boot/intel-ucode.img-new,/boot/initramfs-lts-new
EOF

We have three boot entries: regular, old, and new. When deploying a new version of the system, we rsync it over, and then use extlinux --once to select it for the next boot.

rsync -Pv "$image/vmlinuz-lts" root@frood:/media/usb/boot/vmlinuz-lts-new
rsync -Pv "$image/initramfs-lts" root@frood:/media/usb/boot/initramfs-lts-new
rsync -Pv "$image/intel-ucode.img" root@frood:/media/usb/boot/intel-ucode.img-new
echo "extlinux --once=new /media/usb/boot/syslinux" | ssh root@frood sh

If the machine comes up cleanly, then we move the regular image to old, and new to regular. Otherwise, another reboot rolls it back.

A simple status service

I wanted a simple service to get the status of the system at a glance. There are a million ways to do this, but I chose to write a small Go server. It’s not needed to make this system work, but I am including it to show how easy it is to add a service.

Before the alpine-make-rootfs invocation, I added a couple lines to build all Go binaries in a local module into /usr/local/bin/. Note that even the Go toolchain is selected declaratively from the go.mod thanks to GOTOOLCHAIN=auto.

go env -w GOTOOLCHAIN=auto
go build -C bins -o "$ROOTFS_DEST/usr/local/bin/" ./...

Then I created root/etc/init.d/srvmonitor.

#!/sbin/openrc-run
# shellcheck shell=sh

description="Serve scripts from /etc/monitor.d"
command=/usr/local/bin/srvmonitor
command_background=true
pidfile="/run/${RC_SVCNAME}.pid"

depend() {
    need net localmount
    after firewall
}

And finally I added one line to setup.sh.

rc-update add srvmonitor default

That’s it. The Go server listens on port 80 on the Tailscale IP, and serves the output of scripts I put in /etc/monitor.d/.

frood

The entire setup is open source, in my mostly-harmless repository. You might be interested in how I made ZFS imports work, which is not covered above.

I have not made it into a reusable project partially because there is so little to it. Adding hooks to configure things would easily double its size. I encourage you to just fork it if you’d like.

One thing I haven’t solved yet is how to inject secrets. For now they are just .gitignore’d. Maybe I’ll plug in a YubiKey and use age-plugin-yubikey to decrypt them, and yubikey-agent for the host key. Or maybe this board has a TPM and I can use the simplicity of this system to get a full Secure Boot chain that unlocks TPM keys. That’d be fun.

If you got this far, you might also want to follow me on Bluesky at @filippo.abyssdomain.expert or on Mastodon at @[email protected].

The picture

The natural pools of Porto Moniz, in Madeira. They’re publicly accessible, made of volcanic rock, and filled by the ocean waves that crash spectacularly against them. I was not doing great that day, but it was an excellent place to not do great at.

Madeira is pretty cool.[2] Also one of the trickiest crosswind landings.

A natural pool with clear blue water, surrounded by dark volcanic rocks in the sunset light. The ocean is visible in the background, a few white clouds reflect on it. A rocky island has a lighthouse on the top.

My maintenance work is funded by the awesome Geomys clients: Interchain, Smallstep, Ava Labs, Teleport, SandboxAQ, Charm, Tailscale, and Sentry. Through our retainer contracts they ensure the sustainability and reliability of our open source maintenance work and get a direct line to my expertise and that of the other Geomys maintainers. (Learn more in the Geomys announcement.)

Here are a few words from some of them!

Teleport — For the past five years, attacks and compromises have been shifting from traditional malware and security breaches to identifying and compromising valid user accounts and credentials with social engineering, credential theft, or phishing. Teleport Identity is designed to eliminate weak access patterns through access monitoring, minimize attack surface with access requests, and purge unused permissions via mandatory access reviews.

Ava Labs — We at Ava Labs, maintainer of AvalancheGo (the most widely used client for interacting with the Avalanche Network), believe the sustainable maintenance and development of open source cryptographic protocols is critical to the broad adoption of blockchain technology. We are proud to support this necessary and impactful work through our ongoing sponsorship of Filippo and his team.

SandboxAQ — SandboxAQ’s AQtive Guard is a unified cryptographic management software platform that helps protect sensitive data and ensures compliance with authorities and customers. It provides a full range of capabilities to achieve cryptographic agility, acting as an essential cryptography inventory and data aggregation platform that applies current and future standardization organizations mandates. AQtive Guard automatically analyzes and reports on your cryptographic security posture and policy management, enabling your team to deploy and enforce new protocols, including quantum-resistant cryptography, without re-writing code or modifying your IT infrastructure.

Charm — If you’re a terminal lover, join the club. Charm builds tools and libraries for the command line. Everything from styling terminal apps with Lip Gloss to making your shell scripts interactive with Gum. Charm builds libraries in Go to enhance CLI applications while building with these libraries to deliver CLI and TUI-based apps.


  1. mkfs.vfat -C -i B61B19E7 usb_disk.img $(( 128 * 1024 )) ↩︎

  2. I am not paid by the Madeira Dept. of Tourism, I swear. ↩︎


文章来源: https://words.filippo.io/dispatches/frood/
如有侵权请联系:admin#unsafe.sh