Ad-hoc router using OpenWrt in a VM

/images/openwrt_crop.png

During writing of the last post I actually bricked my home router after installing a custom image. At the same time I didn't manage to get TFTP recovery working [1] so I started searching for ways to restore my home network so I could worry about that later.

The choice fell on using an ARM board to run OpenWrt, which required some creative workarounds. This post documents them.

What I want to replace:

  • router-switch combo: 1x WAN, 5x LAN, no WiFi

  • OS: OpenWrt

What I have:

  • 64-bit ARM SBC, capable of hardware virtualization

  • USB 3.0 Ethernet adapter (if your SBC has 2x LAN natively that works too)

  • 6-port Ethernet switch

Network setup

eth0 will be the LAN, eth1 the WAN. The LAN bridge also gets an IP address for management (pick one outside your DHCP pool) so you can SSH to the host even if the OpenWrt VM is not in operation.

/etc/systemd/network/br0.netdev:

[NetDev]
Name=br0
Kind=bridge

/etc/systemd/network/br1.netdev:

[NetDev]
Name=br1
Kind=bridge

/etc/systemd/network/br0.network:

[Match]
Name=br0
[Bridge]
MulticastRouter=permanent
[Network]
Address=192.168.2.254/24
Gateway=192.168.2.1
DNS=192.168.2.1
IPv6AcceptRA=no
LinkLocalAddressing=no

/etc/systemd/network/br1.netdev:

[Match]
Name=br1
[Bridge]
MulticastRouter=permanent
[Network]
DHCP=no
IPv6AcceptRA=no
LinkLocalAddressing=no

/etc/systemd/network/eth0.network:

[Match]
Name=eth0
[Network]
Bridge=br0

/etc/systemd/network/eth1.network:

[Match]
Name=eth1
[Network]
Bridge=br1

Virtual Machine

I virtualized OpenWrt because it supports a few select boards (eg. Raspberry Pi's), but not the ones I have.

Installation:

cd /root
wget "https://downloads.openwrt.org/snapshots/targets/armvirt/64/openwrt-armvirt-64-"{rootfs-ext4.img.gz,Image}
gunzip openwrt-armvirt-64-rootfs-ext4.img.gz
echo 'allow all' >/etc/qemu/bridge.conf
systemctl enable guest.service

/root/run.sh:

#!/bin/bash
cd /root
exec qemu-system-aarch64 -enable-kvm -cpu host \
        -nographic -M virt,highmem=off -m 128 -smp 4 \
        -kernel openwrt-armvirt-64-Image -append "root=fe00" \
        -drive file=openwrt-armvirt-64-rootfs-ext4.img,format=raw,if=none,id=hd0 \
        -device virtio-blk-pci,drive=hd0 \
        -nic bridge,br=br0,model=virtio -nic bridge,br=br1,model=virtio

/etc/systemd/system/guest.service:

[Unit]
Description=QEMU guest
After=network.target
[Service]
ExecStart=/root/run.sh
Restart=always
RestartSec=10s
[Install]
WantedBy=multi-user.target

OpenWrt

If the default OpenWrt configuration has address conflicts with the rest of your network, you can adjust it ahead of time like so:

ip l add dev test type bridge
ip l add dev test type bridge
./run.sh
# log into openwrt
  uci set network.eth0.ipaddr=192.168.7.1
  uci commit
  poweroff

Now suppose you want to import the configuration of your old OpenWrt install, except: it has totally different interface names all over! Not a problem either as you can rename them by adding the following lines to /etc/rc.local:

ip l set eth0 name lan1
ip l set eth1 name wan

If you did everything right up until this point you can shut down the SBC, plug in all the cables, let it reboot and an OpenWrt router should appear in your network at http://192.168.1.1 just as if it was a real device.

Future thoughts

  • Setting net.core.default_qdisc = pfifo_fast on the host can save some CPU time, not sure whether this breaks QoS

  • If you have a WiFI USB adapter you could pull it into the VM and enjoy wireless too! See here for an example

  • When running this setup long term it would make sense to strip down the host OS

  • The QEMU serial console could be bound to a socket or PTS to interact with it even when guest.service is running

OpenWrt build notes

This post contains Q&A-style notes on compiling software for OpenWrt or compiling OpenWrt itself.

SDK or Source code?

If you want to cross-compile software to run on an OpenWrt device or rebuild/patch existing OpenWrt packages, the SDK suffices.

The SDK can be found under the <target>/<device> folders on downloads.openwrt.org, note that the snapshots are rebuilt frequently and not a stable target to build for. The source is at https://github.com/openwrt/openwrt, make sure to check out a branch/tag if you don't want the development branch.

Ok, now how do I set it up?

Build tools (tested on Ubuntu 22.04):

sudo apt update
sudo apt install build-essential gawk gcc-multilib flex git gettext libncurses5-dev libssl-dev python3-distutils unzip zlib1g-dev

Feeds:

cd openwrt
./scripts/feeds update -a && ./scripts/feeds install -a

Don't forget to do this as an unprivileged user (even if you have a container), build tools don't like being run as a root.

[Source] Target and package selection

Run make menuconfig. The first three option select the device to target. Most of the rest are for packages, < > means a package will not be built, <*> will build it into the final firmware image (preinstalled), <M> will build it as an installable package file.

Pressing / will open up search.

Configuring is annoying so you can keep the .config file for next time. Copy it back and run make oldconfig or yes "" | make oldconfig if you don't like answering questions.

[Source] Building everything

Run nice make -j$(nproc).

The firmware and sysupgrade images will end up in bin/targets/, packages also under bin/packages/.

How do I build a single package?

make packages/NAMEHERE/compile, that's it.

[SDK] How do I build software that's not in OpenWrt?

Haven't had to do that yet, the answer is probably here.

[SDK] I don't care about packaging, where's the cross-compiler?
./staging_dir/toolchain-<whatever>/bin/<arch>-openwrt-linux-gcc
(This likely won't lead you to success unless all you need is a simple C program.)
How do I apply patches to packages?

There's probably multiple ways but the most convenient is overriding the source tree.

You should have a git checkout with your modifications (commit them!) somewhere. Then:

ln -s /somewhere/.git package/network/services/odhcpd/git-src
make package/odhcpd/{clean,compile} V=s CONFIG_SRC_TREE_OVERRIDE=y
[Source] Which kernel version will I get?

Check for KERNEL_PATCHVER in target/linux/<target>/Makefile.

If a KERNEL_TESTING_PATCHVER is defined too you can switch to the newer kernel by enabling "Global build settings > Use the testing kernel version".

Note that OpenWrt kernels are heavily patched so you can't really use a version other than the predefined one even if you wanted to.

[Source] Ricing your build flags

If you've ever used Gentoo you will find this fun. [1]

Under "Advanced configuration options" you can configure global compiler flags (TARGET_OPTIMIZATION) and the ones used by the kernel (KERNEL_CFLAGS).

Tell me more!

Headless Raspberry Pi OS virtualization, 64-bit edition

With Raspbian (now named Raspberry Pi OS) having been released as 64-bit I can finally write a proper sequel to the previous post that dealt with virtualizing ARM/Linux distributions headlessly using QEMU.

You can read the original article here: Virtualizing Raspbian (or any ARM/Linux distro) headless using QEMU. Since the process is the same I will skip detailed explanations here.

Native emulation in QEMU

QEMU includes a raspi3b machine type and emulates UART, SD, USB controllers and more. This is enough for working headless usage.

With the root filesystem prepared and the appropriate files extracted from the boot partition the command line woud look as follows:

qemu-system-aarch64 -M raspi3b -kernel kernel8.img -dtb bcm2710-rpi-3-b.dtb \
        -drive file=rootfs.qcow2,if=sd -usb -device usb-net,netdev=u1 -netdev user,id=u1 \
        -append "root=/dev/mmcblk0 rw console=ttyAMA0" -nographic

The system boots up fine (with a few errors here and there) and is usable but I don't suggest using it like this.

A better alternative

The virt machine type is much better suited for this, our plan is to attach both the disk and network via Virtio.

For the kernel (and modules) we'll grab the linux-aarch64 package from Arch Linux ARM.

Extracting the root filesystem into a virtual disk image

Download Raspberry Pi OS (64-bit) from the official website, then run the script below or follow the steps manually.

The only difference from before is that we have to unlock the "root" user so we can actually log in later.

#!/bin/bash -e
input=2022-04-04-raspios-bullseye-arm64-lite.img
[ -f $input ]

mkdir mnt
cp --reflink=auto $input source.img
truncate -s 10G source.img
echo ", +" | sfdisk -N 2 source.img
dev=$(sudo losetup -fP --show source.img)
[ -n "$dev" ]
sudo resize2fs ${dev}p2
sudo mount ${dev}p2 ./mnt -o rw
sudo sed '/^PARTUUID/d' -i ./mnt/etc/fstab
sudo sed '/^root:/ s|\*||' -i ./mnt/etc/shadow
remove_services=rpi-eeprom-update,hciuart,dphys-swapfile,rng-tools-debian
sudo bash -c "rm -f \
        ./mnt/etc/systemd/system/multi-user.target.wants/{$remove_services}.service \
        ./mnt/etc/rc?.d/?01{$remove_services}"
sudo umount ./mnt
sudo chmod a+r ${dev}p2
qemu-img convert -O qcow2 ${dev}p2 rootfs.qcow2
sudo losetup -d $dev
rm source.img; rmdir mnt

The kernel and initramfs

Kernel

Extract the kernel like this:

tar -xvf linux-aarch64*.pkg.tar.* --strip-components=1 boot/Image.gz

Building an initramfs

The differences to the previous iteration of the script are:

  • Recompressing of zstd kernel modules as gzip (no busybox support for it)

  • Busybox isn't downloaded. You need to compile it for 64-bit ARM yourself and insert the path [1]

#!/bin/bash -e
pkg=$(echo linux-aarch64-*.pkg.tar.*)
[ -f "$pkg" ]

mkdir initrd; pushd initrd
mkdir bin dev mnt proc sys
tar -xaf "../$pkg" --strip-components=1 usr/lib/modules
rm -rf lib/modules/*/kernel/{sound,drivers/{net/{wireless,ethernet},media,gpu,iio,staging,scsi}}
find lib/modules -name '*.zst' -exec zstd -d --rm {} ';'
find lib/modules -name '*.ko' -exec gzip -9 {} ';'
install -p /FILL/ME/IN/busybox-aarch64 bin/busybox
cat >init <<"SCRIPT"
#!/bin/busybox sh
busybox mount -t proc none /proc
busybox mount -t sysfs none /sys
busybox mount -t devtmpfs none /dev

for mod in virtio-pci virtio-blk virtio-net; do
        busybox modprobe $mod
done

busybox mount -o rw /dev/vda /mnt || exit 1

busybox umount /proc
busybox umount /sys
busybox umount /dev

exec busybox switch_root /mnt /sbin/init
SCRIPT
chmod +x bin/busybox init
bsdtar --format newc --uid 0 --gid 0 -cf - -- * | gzip -9 >../initrd.gz
popd; rm -r initrd

Booting the virtual machine

With the initramfs built, we have all parts needed to actually run the VM:

qemu-system-aarch64 -M virt -cpu cortex-a53 -m 2048 -smp 4 -kernel Image.gz -initrd initrd.gz \
        -drive file=rootfs.qcow2,if=virtio -nic user,model=virtio \
        -append "console=ttyAMA0" -nographic
After a bit of booting you should be greeted by Debian GNU/Linux 11 raspberrypi ttyAMA0 and a login prompt.
You can log in as "root" without a password.

Installing UniFi Network on Raspberry Pi OS 11 (bullseye)

Installing an Unifi controller on a Raspberry Pi seems like a straightforward task until you notice the section with system requirements.

The software requires a MongoDB version before 4.0. The last version that satisfies this is 3.7.9 which is almost four years old at the time of writing. You may find old versions packaged on the MongoDB website or in other repositories but certainly not for ARM. The second problem is that MongoDB dropped 32-bit support in version 3.4 so the latest we can actually use is 3.2.22 (also 4 years old).

In the end I was unable to find a build of MongoDB 3.2 that could run on a Pi which leaves only the option of compiling from source. This is what I ended up doing, it required lots of trial and error [1] before it succeeded. To (hopefully) save someone else time I put up the final Debian package for download.

Instructions

Requirements:

  1. Add the Unifi repository

apt install ca-certificates apt-transport-https
echo 'deb https://www.ui.com/downloads/unifi/debian stable ubiquiti' >/etc/apt/sources.list.d/100-ubnt-unifi.list
apt-key adv --keyserver keyserver.ubuntu.com --recv 06E85760C0A52C50
apt update
  1. Install required packages. Unifi needs Java 8 so other versions are temporarily put on hold.

apt-mark hold openjdk-9-* openjdk-11-*
apt install ./mongodb-server_3.2.22_armhf.deb unifi
apt-mark unhold openjdk-9-* openjdk-11-*
  1. Create a mongod wrapper script

printf '%s\n' '#!/bin/bash' 'exec /usr/bin/mongod --journal "$@"' >/usr/lib/unifi/bin/mongod
chmod +x /usr/lib/unifi/bin/mongod
  1. Enable and start the Unifi service: systemctl enable --now unifi

  2. Wait a while, it can take about 5 minutes until the controller is reachable at https://IP:8443/.

Not actually using a Raspberry Pi?

On a normal amd64 Debian system this whole ordeal gets much simpler:

  • Unifi installation works as described as above

  • mongodb-org-server_3.6.23_amd64.deb can be downloaded from the MongoDB website and runs out-of-the-box

  • OpenJDK 8 can be pulled from AdoptOpenJDK's repo since it's not available on recent Debian versions anymore


Overview of VPN providers that support IPv6

Finding VPN providers that provide real IPv6 support (as opposed to "supporting" it by blackholing traffic) is pretty hard so I decided to write down the ones I know about. The amount of information present depends on the provider's documentation and whether I have tested them myself.
This list does not claim to be complete.

Provider Access method IPv6 egress? Connect via IPv6? IP inside tunnel?1 Port forwarding possible? Last updated
AirVPN OpenVPN / Wireguard Yes
(NAT)
Yes Private Yes Mar 2022 (T)
AzireVPN OpenVPN Yes
(NAT)
No Public No Nov 2021 (T)
OpenVPN ("Public IP" option) Yes No Public Not necessary (Public IP)
Wireguard Yes
(NAT)
No2 Public No
Hide.me OpenVPN / Wireguard Yes
(?)
? ? ? Nov 2021
IVPN Wireguard Yes
(NAT)
No Private No Nov 2021 (T)
Mullvad OpenVPN Yes
(NAT)
No Private Yes Mar 2022 (T)
Wireguard Yes
(NAT)
Yes Private Yes
Njalla OpenVPN Yes Yes Public Not necessary (Public IP) Apr 2022 (T)
OpenVPN (NAT option) Yes
(NAT)
Yes Private No
Wireguard Yes
(NAT)
Yes Private No
OVPN.com OpenVPN / Wireguard Yes
(?)
? Public ? Nov 2021
OVPN.to OpenVPN Yes
(NAT)
Yes Private No Oct 2021
Perfect Privacy OpenVPN Yes
(?)
? ? ? Nov 2021

1: This is mainly important for address selection according to RFC3484. If this column says "Private" your OS will prefer IPv4 when connected to the VPN despite presence of an IPv6 address.
2: Supposedly possible via {location}.ipv6.azirevpn.net hostnames, but this did not work. I've been told this is being worked on.

Setting up Smokeping in a systemd-nspawn container

Smokeping is a nifty tool that continuously performs network measurements (such as ICMP ping tests) and graphs the results in a web interface. It can help you assess performance and detect issues in not only your own but also upstream networks.

/images/smokeping_last_864000.png

This is not how your graphs should look.

This article details setup steps for running Smokeping in a systemd-nspawn container with some additional requirements:

  • IPv6 probes must work

  • The container will directly use the host network so that no routing, NAT or address assignment needs to be set up.

  • To reduce disk and runtime footprint the container will run Alpine Linux

Container setup

First we need to set up an Alpine Linux root filesystem in a folder.
Usage is simple: ./alpine-container.sh /var/lib/machines/smokeping

Next we'll boot into the container to configure everything: systemd-nspawn -b -M smokeping -U

If not already done edit /etc/apk/repositories to add the community repo.
Additionally, you have to touch /etc/network/interfaces so that the network initscript can start up later (even though there is nothing to configure).

Install all required packages: apk add smokeping fping lighttpd ttf-dejavu

Make sure that fping works by running e.g. fping ::1.

lighttpd

Next is the lighttpd configuration inside /etc/lighttpd.

Get rid of all the examples: mv lighttpd.conf lighttpd.conf.bak && grep -v '^#' lighttpd.conf.bak | uniq >lighttpd.conf

There are multiple changes to be done in lighttpd.conf:

  • change server.groupname = "smokeping", the CGI process will need access to smokeping's files.

  • add server.port = 8081 and server.use-ipv6 = "enable"

  • configure mod_fastcgi for Smokeping by appending the following:

server.modules += ("mod_fastcgi")
fastcgi.server = (
        ".cgi" => ((
                "bin-path" => "/usr/share/webapps/smokeping/smokeping.cgi",
                "socket" => "/tmp/smokeping-fastcgi.socket",
                "max-procs" => 1,
        )),
)

We also need to link smokeping's files into the webroot: ln -s /usr/share/webapps/smokeping/ /var/www/localhost/htdocs/smokeping

smokeping

Next is the smokeping configuration located at /etc/smokeping/config.

The most important change here is to set cgiurl to the URL smokeping will be externally reachable at, like so:
cgiurl = http://your_server_here:8081/smokeping/smokeping.cgi

Smokeping's configuration [2] isn't super obvious if you haven't done it before so I'll provide an example here (this replaces the Probes and Targets sections):

*** Probes ***

+ FPing
binary = /usr/sbin/fping

+ FPing6
binary = /usr/sbin/fping

*** Targets ***
probe = FPing

menu = Top
title = Network Latency Grapher
remark =

+ targets
menu = IPv4 targets

++ google
menu = Google
title = Example Target: Google (IPv4)
host = 8.8.4.4

+ targets6
menu = IPv6 targets
probe = FPing6

++ google
menu = Google
title = Example Target: Google (IPv6)
host = 2001:4860:4860::8844

Lastly, grant the CGI process write access to the image folder: chmod g+w /var/lib/smokeping/.simg

Final container setup

Set services to run on boot: rc-update add smokeping && rc-update add lighttpd
Then shut down the container using poweroff.

We need to tell systemd-nspawn not to create a virtual network when the container is started as a service.
Do this by creating /etc/systemd/nspawn/smokeping.nspawn:
[Exec]
KillSignal=SIGTERM

[Network]
VirtualEthernet=no

Finally start up the container: systemctl start systemd-nspawn@smokeping
If this does not work due to private users you are running on old systemd [3] and can try again with PrivateUsers=no in the Exec section.

You can now visit http://your_server_here:8081/smokeping/smokeping.cgi and should see a mostly empty page with a sidebar containing "Charts", "IPv4 targets" and "IPv6 targets" on the left.


Fully unprivileged VMs with User Mode Linux (UML) and SLIRP User Networking

A few months ago I wanted to test something that involved OpenVPN on an old, small VPS I rented.

The VPS runs on OpenVZ, which shares a kernel with the host and comes with one important constraint:
TUN/TAP support needs to be manually enabled, which on this VPS it was not.

Maybe run a VM instead? Nope, KVM is not available.

At this point it would've been easier to give up or temporarily rent another VPS, but I really wanted to run the test on this particular one.

Enter: User Mode Linux

User Mode Linux (UML) is a way to run the Linux kernel as an user-mode process on another Linux system, no root or special setup required.

At its time it was considered to be useful for virtualization, sandboxing and more. These days it's well past its prime but it still exists in the Linux source and more importantly works.

You'd build a kernel binary like this:

git clone --depth=100 https://github.com/torvalds/linux
cd linux
make ARCH=um defconfig
nice make ARCH=um -j4
strip vmlinux

The virtual machine will require a root filesystem image, you can obtain one via the usual ways such as debootstrap (Debian/Ubuntu) or pacstrap (Arch) which I won't cover here.

Networking

Now onto the next issue: How is networking supported in User Mode Linux?

UML provides a number of options for network connectivity [1]: attaching to TUN/TAP, purely virtual networks and SLIRP
TUN/TAP is out of question, a virtual network doesn't help us so that leaves only SLIRP.

SLIP is a very old protocol [2] designed to carry IP packets over a serial line. SLIRP describes the use of this protocol to share the host's Internet connection over serial.
The SLIRP application exposes a virtual network to the client and performs NAT internally.

The standard slirp implementation is found on sourceforge: https://sourceforge.net/projects/slirp/
Its last release was in 2006 and the tarball even includes a file named security.patch that swaps out a few sprintf for snprintf and adds /* TODO: length check */ in other places.
At this point it was obvious that this wasn't going to work.

Rewriting slirp

The only logical thing to do now is to rewrite slirp so that it works.

Although slirp itself is dead the concept lives on as libslirp, which is notably used by QEMU [3] and VirtualBox.
libslirp's API is still a bit too low-level so I chose to use libvdeslirp.
SLIP is a simple protocol and not too complicated to implement, the rest is just passing packets around with Ethernet (un-)wrapping and a tiny bit of ARP.

Here's the code: https://gist.github.com/sfan5/696ad2f03f05a3e13952c44f7b767e81

Usage

You'll need:

  • vmlinux: the User Mode Linux kernel

  • root_fs: a root filesystem image

  • /path/to/slirp: the compiled slirp binary (build it using the Makefile that comes with it)

At this point your virtualized Linux system is just one command away:

./vmlinux mem=256M rw eth0=slirp,,/path/to/slirp
Once logged in you need to manually configure the network like this:
ip a add dev eth0 10.0.2.1 && ip l set eth0 up && ip r add default dev eth0
echo nameserver 10.0.2.3 >/etc/resolv.conf
While you enter these you should see --- slirp ready --- on the console you ran vmlinux on.

You can forward port(s) from outside into the VM by editing the commented out code in slirp.c.


Dealing with glibc faccessat2 breakage under systemd-nspawn

Backstory

A few months ago I stumbled upon this report on Red Hat's bugzilla.

The gist of it is that glibc began to make use of the new faccessat2 syscall, which when running under older systemd-nspawn is filtered to return EPERM. This misdirects glibc into assuming a file or folder cannot be accessed, when in reality nspawn just doesn't know the syscall.

A fix was submitted to systemd 1 but it turned out this didn't only affect nspawn, but also needed to be fixed in various container runtimes and related software 2 3 4 5. Hacking around it in glibc 6 or the kernel 7 was proposed, with both (rightfully) rejected immediately.

I pondered what an awful bug that was and was glad I didn't have to deal with this mess.


Fast forward to last week, I upgraded an Arch Linux installation I had running in a container. Immediately after the update pacman refused to work entirely, complaining it "could not find or read" /var/lib/pacman when this directory clearly existed (I checked).

A few minutes later (and after noticing the upgrade to glibc 2.33) it hit me that this was the exact bug I read about months ago. And, worse, that I'd have to deal with a lot more since I have multiple servers that run containers on systemd-nspawn.

Binary patching systemd-nspawn to fix the seccomp filter

If you hit this bug with one of your containers you have exactly one option: upgrade systemd on the host system to v247 or later.

Aside from the fact that upgrading something as central as systemd isn't exactly risk free, I couldn't do that even if I wanted. There is no backported systemd for Ubuntu 18.04 LTS.

This calls for another option: Patching systemd yourself to fix the bug.

Without further ado, here's a Python script doing exactly that. I've tested that it performs the correct patch on Debian 10, Ubuntu 18.04 and Ubuntu 20.04. There are also plenty safeguards that it shouldn't break anything no matter what (no warranty though).

#!/usr/bin/env python3
import subprocess, re, os
path = "/usr/bin/systemd-nspawn"
print("Looking at %s" % path)
proc = subprocess.Popen(["objdump", "-w", "-d", path], stdout=subprocess.PIPE, encoding="ascii")
instr = list(m.groups() for m in (re.match(r'\s*([0-9a-f]+):\s*([0-9a-f ]+)\s{4,}(.+)', line) for line in proc.stdout) if m)
if proc.wait() != 0: raise RuntimeError("objdump returned error")
p_off, p_old, p_new = None, None, None
for i, (addr, b, asm) in enumerate(instr):
        if asm.startswith("call") and "<seccomp_init_for_arch@" in asm:
                print("Found function call at 0x%s:\n  %s%s" % (addr, b, asm))
                for addr, b, asm in instr[i-1:i-12:-1]:
                        m = re.match(r'mov\s+\$0x([0-9a-f]+)\s*,\s*%edx', asm)
                        if m:
                                print("Found argument at 0x%s:\n  %s%s" % (addr, b, asm))
                                m = int(m.group(1), 16)
                                if m == 0x50026:
                                        print("...but it's already patched, nothing to do.")
                                        exit(0)
                                if m != 0x50001: raise RuntimeError("unexpected value")
                                p_off, p_old = int(addr, 16), bytes.fromhex(b)
                                if len(p_old) != 5: raise RuntimeError("unexpected instr length")
                                p_new = b"\xba\x26\x00\x05\x00"
                                break
                        if re.search(r'%[re]?dx|^(call|pop|j[a-z])', asm): break # likely went too far
                break
if not p_off: raise RuntimeError("no patch location found")
print("Patching %d bytes at %d from <%s> to <%s>" % (len(p_old), p_off, p_old.hex(), p_new.hex()))
with open(path, "r+b") as f:
        if os.pread(f.fileno(), len(p_old), p_off) != p_old: raise RuntimeError("contents don't match")
        os.pwrite(f.fileno(), p_new, p_off)
print("OK.")

Running the above script (as root) will attempt to locate certain related instructions in /usr/bin/systemd-nspawn, attempt to patch one of them and hopefully end with an output of "OK.".

What does the binary patch change? Essentially it makes the following change to the compiled code of nspawn-seccomp.c:

 log_debug("Applying allow list on architecture: %s", seccomp_arch_to_string(arch));

-r = seccomp_init_for_arch(&seccomp, arch, SCMP_ACT_ERRNO(EPERM));
+r = seccomp_init_for_arch(&seccomp, arch, SCMP_ACT_ERRNO(ENOSYS));
 if (r < 0)
         return log_error_errno(r, "Failed to allocate seccomp object: %m");

Instead of EPERM, both blocked and unknown syscalls now return ENOSYS back to the libc. This isn't ideal either (error handling code might get a bit confused) but it is more correct and allows glibc to not catastrophically fail upon attempting to use faccessat2,

Unfortunately the same change cannot 8 be applied to processes in already running containers, you have to restart them.

Running Windows 10 for ARM64 in a QEMU virtual machine

/images/2020-08-04_scrot.png

Since the development stages of Windows 10, Microsoft has been releasing a version of Windows that runs on 64-bit ARM (AArch64) based CPUs. Despite some hardware shipping with Windows 10 ARM [1] [2] [3] this port has received little attention and you can barely find programs that run on it.

Naturally, I wanted to try this out to see if it worked. And it turned out it does!

Getting the ISO

I'm not aware of any official page that lets you download an ARM64 ISO, so this part relies on community-made solutions instead:

In the MDL forums I looked for the right ESD download link and used an ESD>ISO conversion script (also found there) to get a bootable ISO.

Alternatively adguard's download page provides similar scripts that download and pack an ISO for you, though they're pretty slow in my experience.

There's one more important point:

I had no success booting version 2004 or 20H2 (specifically: 19041.388 / 19041.423) so I went with version 1909 (18363.592) instead.

Starting with 18363.1139 Windows seems to require the virtualization extension [8] to be enabled. I had initially used an older version before finding this out, the command line below has now been corrected accordingly. (Updated 2020-12-27)

Installation

Before we begin we also need:

  • the Virtio driver ISO

  • an appropriately sized disk image (qemu-img create -f qcow2 disk.qcow2 40G)

  • QEMU_EFI.fd extracted from the edk2.git-aarch64 RPM found here

The qemu command line then looks as follows:

isoname=19042.631.201119-2058.20H2_RELEASE_SVC_REFRESH_CLIENTENTERPRISE_VOL_A64FRE_DE-DE.ISO
virtio=~/Downloads/virtio-win-0.1.185.iso
qemu-system-aarch64 -M virt,virtualization=true -cpu cortex-a53 -smp 4 -m 4096 \
        -device qemu-xhci -device usb-kbd -device usb-tablet \
        -drive file=disk.qcow2,if=virtio \
        -nic user,model=virtio \
        -drive file="$isoname",media=cdrom,if=none,id=cdrom -device usb-storage,drive=cdrom \
        -drive file="$virtio",media=cdrom,if=none,id=drivers -device usb-storage,drive=drivers \
        -bios QEMU_EFI.fd -device ramfb

You can then follow the installation process as normal. Before partitioning the disks the setup will ask you to load disk drivers, these can be found at viostor/w10/ARM64/ on the virtio cdrom.

QEMU video output

The above command line already takes these limitations into account, these sections are for explanation only.

A previous blogpost on running Windows 10 ARM in QEMU has used a patched EDK2 to get support for standard VGA back in. It's not clear to me why EDK2 removed support if it was working, but this is not a solution I wanted to use either way.

It turns out [4] that the options on ARM are limited to virtio gpu and ramfb. Virtio graphics are Linux-only so that leaves ramfb.

Attaching disks with Qemu

Since the virt machine has no SATA controller we cannot attach a hard disk to the VM the usual way, I went with virtio here instead. It would have been possible to do this over usb-storage, this works out of the box and would have saved us all the work with virtio drivers (except for networking [5]).

This also means something else (which has wasted me quite some time): You cannot use -cdrom.

If you do, EDK2 will boot the Windows CD fine but setup will ask you to load drivers early (because it cannot find its own CD). None of the virtio drivers can fix this situation, leaving you stuck with no clear indication what went wrong.

After installation

The onboarding process has a few hiccups (in particular device detection), if you retry it a few times it'll let you continue anyway.

High CPU Usage

After the first boot I noticed two regsvr32.exe processes at 100% CPU that didn't seem to finish in reasonable time.

Further investigation with Process Explorer [6] showed these belonging to Windows' printing service. Since I don't want to print in this VM anyway, the affected service can just be stopped and disabled:

sc stop "Spooler"
sc config "Spooler" start= disabled

Networking

We're still missing the network driver from the virtio cdrom. Unfortunately the NetKVM driver doesn't seem to be properly signed, so you have to enable loading unsigned drivers first (and reboot!):

bcdedit /set testsigning on

Afterwards the right driver can be installed from the device manager (NetKVM/w10/ARM64 on cdrom).

General Performance Tweaks

These aren't specific to Windows 10 ARM or Virtual Machines, but can be pretty useful to stop your VM from acting sluggish.

REM Disable Windows Search Indexing
sc stop "WSearch"
sc config "WSearch" start= disabled
REM Disable Automatic Defragmentation
schtasks /Delete /TN "\Microsoft\Windows\Defrag\ScheduledDefrag" /F
REM Disable Pagefile
wmic computersystem set AutomaticManagedPagefile=FALSE
wmic pagefileset delete
REM Disable Hibernation
powercfg -h off

Higher Display Resolution

As of writing QEMU's ramfb has its resolution locked to 800x600, which even breaks EDK2's menu (press F2 or Esc during boot).

Fortunately, this has already been fixed [7] in qemu versions 5.1.0 and newer.

In addition to that you need vars-template-pflash.raw from the same edk package as earlier (UEFI will store its settings in there).
Add the following to your qemu args: -drive file=vars-template-pflash.raw,if=pflash,index=1

The display resolution can then be increased up to 1024x768 under Device Manager > OVMF Platform Configuration.

Wrapping up

With a bit of preparation it is possible to run Windows 10 ARM in a virtual machine. Although the emulation is somewhat slow you could feasibly use this to test one or two programs.

If you have ARM64 hardware with sufficient specs and KVM support, it should be possible to use -enable-kvm -cpu host for native execution speed, though I haven't had a chance to see how this performs yet.


Installing the Debian X32 port on a VM or real machine

X32 is an ABI for Linux that uses the x86-64 instruction set but 32-bit longs and pointers (this is called ILP32), thereby limiting the memory for a single process to 4 GiB. Compared to amd64 it offers significant memory savings and unlike plain i386 it can make use of all registers and extensions also available to 64-bit code.

Debian has an X32 port since 2013 but installing it isn't quite straightforward.
To follow this guide you'll need:
  • A Debian netinst CD for the amd64 architecture: https://www.debian.org/CD/netinst/

  • A computer or VM with x86-64 compatible CPU

  • An internet connection on the machine you are installing

Booting into rescue mode

In the boot menu, choose "Advanced Options" and select the "Rescue Mode" option.

Press TAB to edit the kernel command line and append the following option before booting: syscall.x32=y

Follow the menus until you're dropped to a shell ("Execute shell in installer environment").

Partitioning

Inside the shell run partman.

This starts the usual partitioning setup seen during Debian installation. Once it's done you will be dropped to the shell again.

Installing the system

Debootstrap

Debootstrap will install the system for us, but unfortunately there is no convenient way to get the debian-ports GPG keys into the rescue environment, so we'll just run with disabled signature checks.

debootstrap --no-check-gpg --arch=x32 unstable /target http://ftp.ports.debian.org/debian-ports/

You might notice that isc-dhcp-client fails to install at this step, this is not critical and will be dealt with later.

Chrooting

The next step is to enter the target system in a chroot:

mount --bind /dev /target/dev
mount --bind /proc /target/proc
mount --bind /sys /target/sys
chroot /target

First we'll do some basic system configuration.

The fstab you write needs to match the configured partition layout, in this example the layout is a single root partition (no /boot, no swap space). Using UUID=... notation here is also advisable but not fun to type out by hand (and irrelevant for a single-disk VM).

Here you should also uninstall the package that failed to install earlier.

passwd root
echo "/dev/sda1 / ext4 rw 0 1" >/etc/fstab

Repository configuration

The X32 repository does not include a Linux kernel package. This requires us to add the amd64 repository so we can install a kernel, but we need to make sure only linux-image but nothing else is pulled from there (via APT pinning).

Configure APT as follows:

apt install -y debian-ports-archive-keyring
cat >/etc/apt/sources.list
deb [arch=x32] http://ftp.ports.debian.org/debian-ports unstable main
deb [arch=amd64] http://deb.debian.org/debian unstable main
^D
cat >/etc/apt/preferences.d/amd64
Package: linux-image*:amd64
Pin: release b=amd64
Pin-Priority: 500

Package: *
Pin: release b=amd64
Pin-Priority: -1
^D
dpkg --add-architecture amd64
apt update

Kernel & Bootloader

Install the kernel and bootloader (GRUB):
export TERM=dumb
apt install -y linux-image-amd64 grub-pc

When asked by GRUB, select your primary hard drive (here: /dev/sda) as installation device.

Next, reconfigure grub to boot up the kernel with X32 enabled:
dpkg-reconfigure grub-pc

GRUB will ask for the Linux command line, which needs to be syscall.x32=y.

Cleaning up

The installation failure during debootstrap has left some files misconfigured.

The purpose of policy-rc.d and the modification to start-stop-daemon is that services are not started up during package installation. Since we're finished installing, these need to be undone:

rm /usr/sbin/policy-rc.d

mv /usr/sbin/start-stop-daemon.REAL /usr/sbin/start-stop-daemon

Lastly exit the chroot, umount the partitions and reboot:

umount /target/dev /target/proc /target/sys
umount /target

sync; reboot

Extras

Why does isc-dhcp-client fail to install?

isc-dhcp-client transitively depends on libmaxminddb, which requires pandoc during build.

pandoc itself is not available on x32 since one of its dependencies is also missing.

This is tracked as Debian bug #956041 and has been solved in February 2021.

Quick DHCP setup using systemd-networkd

Since isc-dhcp-client would provide the DHCP client we have a problem when we want to connect our freshly installed system to a network.

Fortunately, setting up DHCP with systemd-networkd is quite easy:

cat >/etc/systemd/network/eth.network
[Match]
Name=en*
[Network]
DHCP=ipv4
^D
systemctl enable --now systemd-networkd

Installing standard system software

tasksel can take care of installing software you'd usually find on a standard Debian installation (man pages, locale support, Perl, ...).

tasksel install standard

Same issue as before: isc-dhcp-client and bind9-related packages are uninstallable, so that requires some manual working around:

eval $(tasksel -t install standard | tr ' ' '\n' | egrep -v 'isc-dhcp-client|^bind9-' | tr '\n' ' ')