Overview of VPN providers that support IPv6

Finding VPN providers that provide real IPv6 support (as opposed to "supporting" it by blackholing traffic) is pretty hard so I decided to write down the ones I know about. The amount of information present depends on the provider's documentation and whether I have tested them myself.
This list does not claim to be complete.

Provider Access method IPv6 egress? Connect via IPv6? IP inside tunnel?1 Port forwarding possible? Last updated
AirVPN OpenVPN / Wireguard Yes
(NAT)
Yes Private Yes Nov 2021 (T)
AzireVPN OpenVPN Yes
(NAT)
No Public No Nov 2021 (T)
OpenVPN ("Public IP" option) Yes No Public Not necessary (Public IP)
Wireguard Yes
(NAT)
No2 Public No
Hide.me OpenVPN / Wireguard Yes
(?)
? ? ? Nov 2021
IVPN Wireguard Yes
(NAT)
No Private No Nov 2021 (T)
Mullvad OpenVPN Yes
(NAT)
? ? Yes Nov 2021
Wireguard Yes
(NAT)
Yes Yes
OVPN.com OpenVPN / Wireguard Yes
(?)
? Public ? Nov 2021
OVPN.to OpenVPN Yes
(NAT)
Yes Private No Oct 2021
Perfect Privacy OpenVPN Yes
(?)
? ? ? Nov 2021

1: This is mainly important for address selection according to RFC3484. If this column says "Private" your OS will prefer IPv4 when connected to the VPN despite presence of an IPv6 address.
2: Supposedly possible via {location}.ipv6.azirevpn.net hostnames, but this did not work. I've been told this is being worked on.

Setting up Smokeping in an systemd-nspawn container

Smokeping is a nifty tool that continuously performs network measurements (such as ICMP ping tests) and graphs the results in a web interface. It can help you assess performance and detect issues in not only your own but also upstream networks.

/images/smokeping_last_864000.png

This is not how your graphs should look.

This article details setup steps for running Smokeping in a systemd-nspawn container with some additional requirements:

  • IPv6 probes must work

  • The container will directly use the host network so that no routing, NAT or address assignment needs to be set up.

  • To reduce disk and runtime footprint the container will run Alpine Linux

Container setup

First we need to set up an Alpine Linux root filesystem in a folder.
Usage is simple: ./alpine-container.sh /var/lib/machines/smokeping

Next we'll boot into the container to configure everything: systemd-nspawn -b -M smokeping -U

If not already done edit /etc/apk/repositories to add the community repo.
Additionally, you have to touch /etc/network/interfaces so that the network initscript can start up later (even though there is nothing to configure).

Install all required packages: apk add smokeping fping lighttpd ttf-dejavu

Make sure that fping works by running e.g. fping ::1.

Tip

If this does not work you need to configure the host to allow unprivileged pings.

This is done by setting the following sysctl: net.ipv4.ping_group_range=0 2147483647 (usually by editing /etc/sysctl.conf)

Also note that Alpine Linux must be 3.13 or newer for this to work 1.

lighttpd

Next is the lighttpd configuration inside /etc/lighttpd.

Get rid of all the examples: mv lighttpd.conf lighttpd.conf.bak && grep -v '^#' lighttpd.conf.bak | uniq >lighttpd.conf

There are multiple changes to be done in lighttpd.conf:

  • change server.groupname = "smokeping", the CGI process will need access to smokeping's files.

  • add server.port = 8081 and server.use-ipv6 = "enable"

  • configure mod_fastcgi for Smokeping by appending the following:

server.modules += ("mod_fastcgi")
fastcgi.server = (
        ".cgi" => ((
                "bin-path" => "/usr/share/webapps/smokeping/smokeping.cgi",
                "socket" => "/tmp/smokeping-fastcgi.socket",
                "max-procs" => 1,
        )),
)

We also need to link smokeping's files into the webroot: ln -s /usr/share/webapps/smokeping/ /var/www/localhost/htdocs/smokeping

smokeping

Next is the smokeping configuration located at /etc/smokeping/config.

The most important change here is to set cgiurl to the URL smokeping will be externally reachable at, like so:
cgiurl = http://your_server_here:8081/smokeping/smokeping.cgi

Smokeping's configuration 2 isn't super obvious if you haven't done it before so I'll provide an example here (this replaces the Probes and Targets sections):

*** Probes ***

+ FPing
binary = /usr/sbin/fping

+ FPing6
binary = /usr/sbin/fping

*** Targets ***
probe = FPing

menu = Top
title = Network Latency Grapher
remark =

+ targets
menu = IPv4 targets

++ google
menu = Google
title = Example Target: Google (IPv4)
host = 8.8.4.4

+ targets6
menu = IPv6 targets
probe = FPing6

++ google
menu = Google
title = Example Target: Google (IPv6)
host = 2001:4860:4860::8844

Lastly, grant the CGI process write access to the image folder: chmod g+w /var/lib/smokeping/.simg

Final container setup

Set services to run on boot: rc-update add smokeping && rc-update add lighttpd
Then shut down the container using poweroff.

We need to tell systemd-nspawn not to create a virtual network when the container is started as a service.
Do this by creating /etc/systemd/nspawn/smokeping.nspawn:
[Exec]
KillSignal=SIGTERM

[Network]
VirtualEthernet=no

Finally start up the container: systemctl start systemd-nspawn@smokeping
If this does not work due to private users you are running on old systemd 3 and can try again with PrivateUsers=no in the Exec section.

You can now visit http://your_server_here:8081/smokeping/smokeping.cgi and should see a mostly empty page with a sidebar containing "Charts", "IPv4 targets" and "IPv6 targets" on the left.

1

Unprivileged pings only work since FPing v4.3 https://github.com/schweikert/fping/pull/173

2

A huge manpage https://oss.oetiker.ch/smokeping/doc/smokeping_config.en.html

3

https://github.com/systemd/systemd/issues/7429

Fully unprivileged VMs with User Mode Linux (UML) and SLIRP User Networking

A few months ago I wanted to test something that involved OpenVPN on a small VPS I rented.

The VPS runs on OpenVZ, which shares a kernel with the host and comes with one important constraint:
TUN/TAP support needs to be manually enabled, which on this VPS it was not.

Maybe run a VM instead? Nope, KVM is not available.

At this point it would've been easier to give up or temporarily rent another VPS, but I really wanted to run the test on this particular one.

Enter: User Mode Linux

User Mode Linux (UML) is a way to run the Linux kernel as an user-mode process on another Linux system, no root or special setup required.

At its time it was considered to be useful for virtualization, sandboxing and more. These days it's well past its prime but it still exists in the Linux source and more importantly works.

You'd build a kernel binary like this:

git clone --depth=100 https://github.com/torvalds/linux
cd linux
make ARCH=um defconfig
nice make ARCH=um -j4
strip vmlinux

The virtual machine will require a root filesystem image, you can obtain one via the usual ways such as debootstrap (Debian/Ubuntu) or pacstrap (Arch) which I won't cover here.

Networking

Now onto the next issue: How is networking supported in User Mode Linux?

UML provides a number of options for network connectivity 1: attaching to TUN/TAP, purely virtual networks and SLIRP
TUN/TAP is out of question, a virtual network doesn't help us so that leaves only SLIRP.

SLIP is a very old protocol 2 designed to carry IP packets over a serial line. SLIRP describes the use of this protocol to share the host's Internet connection over serial.
The SLIRP application exposes a virtual network to the client and performs NAT internally.

The standard slirp implementation is found on sourceforge: https://sourceforge.net/projects/slirp/
Its last release was in 2006 and the tarball even includes a file named security.patch that swaps out a few sprintf for snprintf and adds /* TODO: length check */ in other places.
At this point it was obvious that this wasn't going to work.

Rewriting slirp

The only logical thing to do now is to rewrite slirp so that it works.

Although slirp itself is dead the concept lives on as libslirp, which is notably used by QEMU 3 and VirtualBox.
libslirp's API is still a bit too low-level so I chose to use libvdeslirp.
SLIP is a simple protocol and not too complicated to implement, the rest is just passing packets around with Ethernet (un-)wrapping and a tiny bit of ARP.

Here's the code: https://gist.github.com/sfan5/696ad2f03f05a3e13952c44f7b767e81

Usage

You'll need:

  • vmlinux: the User Mode Linux kernel

  • root_fs: a root filesystem image

  • /path/to/slirp: the compiled slirp binary (build it using the Makefile that comes with it)

At this point your virtualized Linux system is just one command away:

./vmlinux mem=256M rw eth0=slirp,,/path/to/slirp
Once logged in you need to manually configure the network like this:
ip a add dev eth0 10.0.2.1 && ip l set eth0 up && ip r add default dev eth0
echo nameserver 10.0.2.3 >/etc/resolv.conf
While you enter these you should see --- slirp ready --- on the console you ran vmlinux on.

You can forward port(s) from outside into the VM by editing the commented out code section in slirp.c.

1

http://user-mode-linux.sourceforge.net/old/networking.html

2

https://tools.ietf.org/html/rfc1055

3

https://wiki.qemu.org/Documentation/Networking#User_Networking_.28SLIRP.29

Dealing with glibc faccessat2 breakage under systemd-nspawn

Backstory

A few months ago I stumbled upon this report on Red Hat's bugzilla.

The gist of it is that glibc began to make use of the new faccessat2 syscall, which when running under older systemd-nspawn is filtered to return EPERM. This misdirects glibc into assuming a file or folder cannot be accessed, when in reality nspawn just doesn't know the syscall.

A fix was submitted to systemd 1 but it turned out this didn't only affect nspawn, but also needed to be fixed in various container runtimes and related software 2 3 4 5. Hacking around it in glibc 6 or the kernel 7 was proposed, with both (rightfully) rejected immediately.

I pondered what an awful bug that was and was glad I didn't have to deal with this mess.


Fast forward to last week, I upgraded an Arch Linux installation I had running in a container. Immediately after the update pacman refused to work entirely, complaining it "could not find or read" /var/lib/pacman when this directory clearly existed (I checked).

A few minutes later (and after noticing the upgrade to glibc 2.33) it hit me that this was the exact bug I read about months ago. And, worse, that I'd have to deal with a lot more since I have multiple servers that run containers on systemd-nspawn.

Binary patching systemd-nspawn to fix the seccomp filter

If you hit this bug with one of your containers you have exactly one option: upgrade systemd on the host system to v247 or later.

Aside from the fact that upgrading something as central as systemd isn't exactly risk free, I couldn't do that even if I wanted. There is no backported systemd for Ubuntu 18.04 LTS.

This calls for another option: Patching systemd yourself to fix the bug.

Without further ado, here's a Python script doing exactly that. I've tested that it performs the correct patch on Debian 10, Ubuntu 18.04 and Ubuntu 20.04. There are also plenty safeguards that it shouldn't break anything no matter what (no warranty though).

#!/usr/bin/env python3
import subprocess, re, os
path = "/usr/bin/systemd-nspawn"
print("Looking at %s" % path)
proc = subprocess.Popen(["objdump", "-w", "-d", path], stdout=subprocess.PIPE, encoding="ascii")
instr = list(m.groups() for m in (re.match(r'\s*([0-9a-f]+):\s*([0-9a-f ]+)\s{4,}(.+)', line) for line in proc.stdout) if m)
if proc.wait() != 0: raise RuntimeError("objdump returned error")
p_off, p_old, p_new = None, None, None
for i, (addr, b, asm) in enumerate(instr):
        if asm.startswith("call") and "<seccomp_init_for_arch@" in asm:
                print("Found function call at 0x%s:\n  %s%s" % (addr, b, asm))
                for addr, b, asm in instr[i-1:i-12:-1]:
                        m = re.match(r'mov\s+\$0x([0-9a-f]+)\s*,\s*%edx', asm)
                        if m:
                                print("Found argument at 0x%s:\n  %s%s" % (addr, b, asm))
                                m = int(m.group(1), 16)
                                if m == 0x50026:
                                        print("...but it's already patched, nothing to do.")
                                        exit(0)
                                if m != 0x50001: raise RuntimeError("unexpected value")
                                p_off, p_old = int(addr, 16), bytes.fromhex(b)
                                if len(p_old) != 5: raise RuntimeError("unexpected instr length")
                                p_new = b"\xba\x26\x00\x05\x00"
                                break
                        if re.search(r'%[re]?dx|^(call|pop|j[a-z])', asm): break # likely went too far
                break
if not p_off: raise RuntimeError("no patch location found")
print("Patching %d bytes at %d from <%s> to <%s>" % (len(p_old), p_off, p_old.hex(), p_new.hex()))
with open(path, "r+b") as f:
        if os.pread(f.fileno(), len(p_old), p_off) != p_old: raise RuntimeError("contents don't match")
        os.pwrite(f.fileno(), p_new, p_off)
print("OK.")

Running the above script (as root) will attempt to locate certain related instructions in /usr/bin/systemd-nspawn, attempt to patch one of them and hopefully end with an output of "OK.".

What does the binary patch change? Essentially it makes the following change to the compiled code of nspawn-seccomp.c:

 log_debug("Applying allow list on architecture: %s", seccomp_arch_to_string(arch));

-r = seccomp_init_for_arch(&seccomp, arch, SCMP_ACT_ERRNO(EPERM));
+r = seccomp_init_for_arch(&seccomp, arch, SCMP_ACT_ERRNO(ENOSYS));
 if (r < 0)
         return log_error_errno(r, "Failed to allocate seccomp object: %m");

Instead of EPERM, both blocked and unknown syscalls now return ENOSYS back to the libc. This isn't ideal either (error handling code might get a bit confused) but it is more correct and allows glibc to not catastrophically fail upon attempting to use faccessat2,

Unfortunately the same change cannot 8 be applied to processes in already running containers, you have to restart them.

Running Windows 10 for ARM64 in a QEMU virtual machine

/images/2020-08-04_scrot.png

Since the development stages of Windows 10, Microsoft has been releasing a version of Windows that runs on 64-bit ARM (AArch64) based CPUs. Despite some hardware shipping with Windows 10 ARM 1 2 3 this port has received little attention and you can barely find programs that run on it.

Naturally, I wanted to try this out to see if it worked. And it turned out it does!

Getting the ISO

I'm not aware of any official page that lets you download an ARM64 ISO, so this part relies on community-made solutions instead:

In the MDL forums I looked for the right ESD download link and used an ESD>ISO conversion script (also found there) to get a bootable ISO.

Alternatively adguard's download page provides similar scripts that download and pack an ISO for you, though they're pretty slow in my experience.

There's one more important point:

I had no success booting version 2004 or 20H2 (specifically: 19041.388 / 19041.423) so I went with version 1909 (18363.592) instead.

Starting with 18363.1139 Windows seems to require the virtualization extension 8 to be enabled. I had initially used an older version before finding this out, the command line below has now been corrected accordingly. (Updated 2020-12-27)

Installation

Before we begin we also need:

  • the Virtio driver ISO

  • an appropriately sized disk image (qemu-img create -f qcow2 disk.qcow2 40G)

  • QEMU_EFI.fd extracted from the edk2.git-aarch64 RPM found here

The qemu command line then looks as follows:

isoname=19042.631.201119-2058.20H2_RELEASE_SVC_REFRESH_CLIENTENTERPRISE_VOL_A64FRE_DE-DE.ISO
virtio=~/Downloads/virtio-win-0.1.185.iso
qemu-system-aarch64 -M virt,virtualization=true -cpu cortex-a53 -smp 4 -m 4096 \
        -device qemu-xhci -device usb-kbd -device usb-tablet \
        -drive file=disk.qcow2,if=virtio \
        -nic user,model=virtio \
        -drive file="$isoname",media=cdrom,if=none,id=cdrom -device usb-storage,drive=cdrom \
        -drive file="$virtio",media=cdrom,if=none,id=drivers -device usb-storage,drive=drivers \
        -bios QEMU_EFI.fd -device ramfb

You can then follow the installation process as normal. Before partitioning the disks the setup will ask you to load disk drivers, these can be found at viostor/w10/ARM64/ on the virtio cdrom.

QEMU video output

The above command line already takes these limitations into account, these sections are for explanation only.

A previous blogpost on running Windows 10 ARM in QEMU has used a patched EDK2 to get support for standard VGA back in. It's not clear to me why EDK2 removed support if it was working, but this is not a solution I wanted to use either way.

It turns out 4 that the options on ARM are limited to virtio gpu and ramfb. Virtio graphics are Linux-only so that leaves ramfb.

Attaching disks with Qemu

Since the virt machine has no SATA controller we cannot attach a hard disk to the VM the usual way, I went with virtio here instead. It would have been possible to do this over usb-storage, this works out of the box and would have saved us all the work with virtio drivers (except for networking 5).

This also means something else (which has wasted me quite some time): You cannot use -cdrom.

If you do, EDK2 will boot the Windows CD fine but setup will ask you to load drivers early (because it cannot find its own CD). None of the virtio drivers can fix this situation, leaving you stuck with no clear indication what went wrong.

After installation

The onboarding process has a few hiccups (in particular device detection), if you retry it a few times it'll let you continue anyway.

High CPU Usage

After the first boot I noticed two regsvr32.exe processes at 100% CPU that didn't seem to finish in reasonable time.

Further investigation with Process Explorer 6 showed these belonging to Windows' printing service. Since I don't want to print in this VM anyway, the affected service can just be stopped and disabled:

sc stop "Spooler"
sc config "Spooler" start= disabled

Networking

We're still missing the network driver from the virtio cdrom. Unfortunately the NetKVM driver doesn't seem to be properly signed, so you have to enable loading unsigned drivers first (and reboot!):

bcdedit /set testsigning on

Afterwards the right driver can be installed from the device manager (NetKVM/w10/ARM64/ on cdrom).

General Performance Tweaks

These aren't specific to Windows 10 ARM or Virtual Machines, but can be pretty useful to stop your VM from acting sluggish.

REM Disable Windows Search Indexing
sc stop "WSearch"
sc config "WSearch" start= disabled
REM Disable Automatic Defragmentation
schtasks /Delete /TN "\Microsoft\Windows\Defrag\ScheduledDefrag" /F
REM Disable Pagefile
wmic computersystem set AutomaticManagedPagefile=FALSE
wmic pagefileset delete
REM Disable Hibernation
powercfg -h off

Higher Display Resolution

As of writing QEMU's ramfb has its resolution locked to 800x600, which even breaks EDK2's menu (press F2 or Esc during boot).

Fortunately, this has already been fixed in master 7 and will be in qemu 5.1.0. You can compile 5.1.0-rc3 today if you don't want to wait.

In addition to that you need vars-template-pflash.raw from the same edk package as earlier (UEFI will store its settings in there).
Add the following to your qemu args: -drive file=vars-template-pflash.raw,if=pflash,index=1

The display resolution can then be set up to 1024x768 under Device Manager > OVMF Platform Configuration.

Wrapping up

With a bit of preparation it is possible to run Windows 10 ARM in a virtual machine. Although the emulation is somewhat slow you could feasibly use this to test one or two programs.

If you have ARM64 hardware with sufficient specs and KVM support, it should be possible to use -enable-kvm -cpu host for native execution speed, though I haven't had a chance to see how this performs yet.

1

https://www.samsung.com/au/tablets/galaxy-book-s-w767/SM-W767NZAAXSA/

2

https://web.archive.org/web/20201022074409/https://www.lenovo.com/ie/en/laptops/yoga/yoga-c-series/Yoga-C630-13Q50/p/88YGC601090

3

https://www.microsoft.com/en-us/p/surface-pro-x/8vdnrp2m6hhc

4

https://www.kraxel.org/blog/2019/09/display-devices-in-qemu/#tldr

5

The usb-net device does not work and doesn't appear in Windows' device manager either.

6

Procexp for ARM64 is available here: http://live.sysinternals.com/ARM64/

7

https://github.com/qemu/qemu/commit/c326eedc7584b94f6f9f3b8ba61a6e9ff04ad681

8

This coincides with the introduction of Windows Hypervisor Platform: https://docs.microsoft.com/en-us/virtualization/api/

Installing the Debian X32 port on a VM or real machine

X32 is an ABI for Linux that uses the x86-64 instruction set but 32-bit longs and pointers (this is called ILP32), thereby limiting the memory for a single process to 4 GiB. Compared to amd64 it offers significant memory savings and unlike plain i386 it can make use of all registers and extensions also available to 64-bit code.

Debian has an X32 port since 2013 but installing it isn't quite straightforward.
To follow this guide you'll need:
  • A Debian netinst CD for the amd64 architecture: https://www.debian.org/CD/netinst/

  • A computer or VM with x86-64 compatible CPU

  • An internet connection on the machine you are installing

Booting into rescue mode

In the boot menu, choose "Advanced Options" and select the "Rescue Mode" option.

Press TAB to edit the kernel command line and append the following option before booting: syscall.x32=y

Follow the menus until you're dropped to a shell ("Execute shell in installer environment").

Partitioning

Inside the shell run partman.

This starts the usual partitioning setup seen during Debian installation. Once it's done you will be dropped to the shell again.

Installing the system

Debootstrap

Debootstrap will install the system for us, but unfortunately there is no convenient way to get the debian-ports GPG keys into the rescue environment, so we'll just run with disabled signature checks.

debootstrap --no-check-gpg --arch=x32 unstable /target http://ftp.ports.debian.org/debian-ports/

You might notice that isc-dhcp-client fails to install at this step, this is not critical and will be dealt with later.

Chrooting

The next step is to enter the target system in a chroot:

mount --bind /dev /target/dev
mount --bind /proc /target/proc
mount --bind /sys /target/sys
chroot /target

First we'll do some basic system configuration.

The fstab you write needs to match the configured partition layout, in this example the layout is a single root partition (no /boot, no swap space). Using UUID=... notation here is also advisable but not fun to type out by hand (and irrelevant for a single-disk VM).

Here you should also uninstall the package that failed to install earlier.

passwd root
echo "/dev/sda1 / ext4 rw 0 1" >/etc/fstab

Repository configuration

The X32 repository does not include a Linux kernel package. This requires us to add the amd64 repository so we can install a kernel, but we need to make sure only linux-image but nothing else is pulled from there (via APT pinning).

Configure APT as follows:

apt install -y debian-ports-archive-keyring
cat >/etc/apt/sources.list
deb [arch=x32] http://ftp.ports.debian.org/debian-ports unstable main
deb [arch=amd64] http://deb.debian.org/debian unstable main
^D
cat >/etc/apt/preferences.d/amd64
Package: linux-image*:amd64
Pin: release b=amd64
Pin-Priority: 500

Package: *
Pin: release b=amd64
Pin-Priority: -1
^D
dpkg --add-architecture amd64
apt update

Kernel & Bootloader

Install the kernel and bootloader (GRUB):
export TERM=dumb
apt install -y linux-image-amd64 grub-pc

When asked by GRUB, select your primary hard drive (here: /dev/sda) as installation device.

Next, reconfigure grub to boot up the kernel with X32 enabled:
dpkg-reconfigure grub-pc

GRUB will ask for the Linux command line, which needs to be syscall.x32=y.

Cleaning up

The installation failure during debootstrap has left some files misconfigured.

The purpose of policy-rc.d and the modification to start-stop-daemon is that services are not started up during package installation. Since we're finished installing, these need to be undone:

rm /usr/sbin/policy-rc.d

mv /usr/sbin/start-stop-daemon.REAL /usr/sbin/start-stop-daemon

Lastly exit the chroot, umount the partitions and reboot:

umount /target/dev /target/proc /target/sys
umount /target

sync; reboot

Extras

Why does isc-dhcp-client fail to install?

isc-dhcp-client transitively depends on libmaxminddb, which requires pandoc during build.

pandoc itself is not available on x32 since one of its dependencies is also missing.

This is tracked as Debian bug #956041 and has been solved in February 2021.

Quick DHCP setup using systemd-networkd

Since isc-dhcp-client would provide the DHCP client we have a problem when we want to connect our freshly installed system to a network.

Fortunately, setting up DHCP with systemd-networkd is quite easy:

cat >/etc/systemd/network/eth.network
[Match]
Name=en*
[Network]
DHCP=ipv4
^D
systemctl enable --now systemd-networkd

Installing standard system software

tasksel can take care of installing software you'd usually find on a standard Debian installation (man pages, locale support, Perl, ...).

tasksel install standard

Same issue as before: isc-dhcp-client and bind9-related packages are uninstallable, so that requires some manual working around:

eval $(tasksel -t install standard | tr ' ' '\n' | egrep -v 'isc-dhcp-client|^bind9-' | tr '\n' ' ')

Virtualizing Raspbian (or any ARM/Linux distro) headless using QEMU

For testing or development it can be very useful to have a distribution that usually runs on an embedded ARM board such as the Raspberry Pi run right on your machine (that isn't ARM) using a virtual machine.

QEMU provides excellent support for emulation of the ARM architecture (both 32 and 64-bit) and can emulate many different real ARM boards.

Why not use QEMU's "raspi2" machine for emulation?

QEMU comes with a raspi2 machine. It emulates the GPU's framebuffer, HWRNG, UART, GPIO and SD controller.

Spot something missing? It doesn't implement USB, which makes it useless for headless and graphical use as you can plug in neither a network connection nor a keyboard or mouse.

(Update 2021-01-08: QEMU has apparently added raspi USB support in versions 5.1.0 onward, so you could skip much of the setup detailed here if doing this from scratch.)

If you still want to use it, this guide will only help you halfway but here are the parameters:

-M raspi2 -kernel kernel7l.img -dtb bcm2709-rpi-2-b.dtb -append "root=/dev/mmcblk0 rootwait console=ttyAMA0"

The plan

Instead of (poorly) emulating a real piece of hardware, QEMU also has a virt machine 1 that is designed for virtualization. It gives you a modern system with PCI and also works out-of-the-box with Linux without providing a Device Tree (QEMU generates one internally).

The most straightforward way of getting network and disk into such a VM is to use virtio-net and virtio-disk respectively, which is what we'll be doing.

Since virtio requires kernel support, chances are the Raspberry Pi kernel wouldn't work anyway, so we'll be using a different one.

I picked Arch Linux ARM's armv7 kernel from here, though any other should work just as well provided it comes with the appropriate modules. To load the virtio modules during boot we'll require an initramfs, but more on that later.

Extracting Raspbian's root filesystem into a virtual disk image

Start by downloading Raspbian from the Raspberry Pi website, then run the script below or follow the steps manually.

The script will create a copy of the image file, expand the image and its partition to 10 gigabytes, mount the partition using a loop device and make two adjustments:

  • Remove both SD card partitions from /etc/fstab, these don't exist inside the VM and we will be mounting the rootfs ourselves

  • Disable several startup services that do not work inside the VM

After unmounting the partition it will convert the filesystem into a qcow2 format image for use with QEMU.

#!/bin/bash -e
input=2020-02-13-raspbian-buster-lite.img
[ -f $input ]

mkdir mnt
cp --reflink=auto $input source.img
truncate -s 10G source.img
echo ", +" | sfdisk -N 2 source.img
dev=$(sudo losetup -fP --show source.img)
[ -n "$dev" ]
sudo resize2fs ${dev}p2
sudo mount ${dev}p2 ./mnt -o rw
sudo sed '/^PARTUUID/d' -i ./mnt/etc/fstab
sudo rm \
        ./mnt/etc/systemd/system/multi-user.target.wants/{hciuart,dphys-swapfile}.service \
        ./mnt/etc/rc?.d/?01{resize2fs_once,rng-tools}
sudo umount ./mnt
sudo chmod a+r ${dev}p2
qemu-img convert -O qcow2 ${dev}p2 rootfs.qcow2
sudo losetup -d $dev
rm source.img; rmdir mnt

The kernel and initramfs

Kernel

Conveniently the linux-armv7 package is just a tar archive, so you can extract the kernel executable using:

tar -xvf linux-armv7*.pkg.tar.xz --strip-components=1 boot/zImage

Making an initramfs

Since virtio support is not compiled into the kernel and the root filesytem is missing modules for the exact kernel we'll be using (maybe copying them would've been easier?), we need to write an initramfs that can load these modules prior to mounting the rootfs.

Fortunately the Gentoo Wiki has a great article on making a custom one yourself. The basic idea is to extract the required kernel modules into the initramfs, whose init script loads the modules, mounts the root filesystem and actually boots.

The script shown below does the following steps:

  • Extract kernel modules from package

  • Delete some that we won't be needing and take a lot of space (optional)

  • Download and install a statically-linked busybox executable

  • Create the init script

  • Pack contents into a cpio archive as required by the Linux kernel

Using a virtio disk and network adapter requires loading the virtio-pci, virtio-blk, virtio-net modules. If you need any more the init script can easily be changed accordingly.

#!/bin/bash -e
pkg=$(echo linux-armv7-*.pkg.tar.xz)
[ -f "$pkg" ]

mkdir initrd; pushd initrd
mkdir bin proc sys dev mnt
tar -xaf "../$pkg" --strip-components=1 usr/lib/modules
rm -rf lib/modules/*/kernel/{sound,drivers/{net/{wireless,ethernet},media,gpu,iio,staging,scsi}}
wget https://www.busybox.net/downloads/binaries/1.31.0-defconfig-multiarch-musl/busybox-armv7l -O bin/busybox
cat >init <<"CONTENTS"
#!/bin/busybox sh
busybox mount -t proc none /proc
busybox mount -t sysfs none /sys
busybox mount -t devtmpfs none /dev

for mod in virtio-pci virtio-blk virtio-net; do
        busybox modprobe $mod
done

busybox mount -o rw /dev/vda /mnt || exit 1

busybox umount /proc
busybox umount /sys
busybox umount /dev

exec busybox switch_root /mnt /sbin/init
CONTENTS
chmod +x bin/busybox init
bsdtar --format newc --uid 0 --gid 0 -cf - -- * | gzip -9 >../initrd.gz
popd; rm -r initrd

Booting the virtual machine

With the initramfs built, we have all parts needed to actually run the VM: 2

qemu-system-arm -M virt,highmem=off -m 2048 -smp 4 -kernel zImage -initrd initrd.gz \
        -drive file=rootfs.qcow2,if=virtio -nic user,model=virtio \
        -append "console=ttyAMA0" -nographic
After roughly a minute of booting you should be greeted by
Raspbian GNU/Linux 10 raspberrypi ttyAMA0 and a login prompt.

Further steps

This virtualization approach should work for just about any ARM/Linux distribution. I have tested it with Raspbian, Void Linux and Arch Linux ARM (whose rootfs even works without any modifications).

To ensure the kernel performs as expected beyond basic tasks, it's a good idea to extract the modules from the linux-armv7 package into the guest rootfs.

As with any VM, you can use the full extent of QEMU's features to e.g.:

  • attach an USB controller (-device usb-ehci or nec-usb-xhci)

  • ..SCSI controller (-device virtio-scsi)

  • ..Audio input/output (-device usb-audio)

  • or even enable graphical output (-device VGA)

AArch64?

With a few adjustments in the right places, this guide also works to emulate an AArch64 kernel and userland. With two caveats:

  • qemu-system-aarch64 will not actually start up in in AArch64 mode unless you use -cpu cortex-a53

  • The busybox-armv8l binary available on the website isn't 64-bit, you'll have to build your own

1

https://wiki.qemu.org/Documentation/Platforms/ARM#Generic_ARM_system_emulation_with_the_virt_machine

2

Why highmem=off is necessary: https://bugs.launchpad.net/qemu/+bug/1790975

Opening a shell inside non-systemd nspawn containers

If you try to open shell inside a container that runs e.g. Alpine Linux using machinectl, the following not very descriptive error will appear:

# machinectl shell vpn
Failed to get shell PTY: Protocol error

The reason for this is simply that the container is not running systemd.

Because systemd-nspawn just uses Linux namespaces 1, nsenter can alternatively be used to access the container. For this, we'll need the PID of the init process inside the container:

# systemctl status systemd-nspawn@vpn
● systemd-nspawn@vpn.service - Container vpn
   Loaded: loaded (/lib/systemd/system/systemd-nspawn@.service; disabled; vendor preset: enabled)
   Active: active (running) since Sun 2019-08-11 19:49:19 UTC; 6 months 3 days ago
 Main PID: 795 (systemd-nspawn)
   Status: "Container running."
   CGroup: /machine.slice/systemd-nspawn@vpn.service
           ├─payload
           │ ├─ 797 /sbin/init
           │ ├─1028 /sbin/syslogd -t
           [...]
In this case the PID of init is 797, you can then spawn a login shell inside the container:

nsenter -t 797 -a /bin/sh -l

Depending on the namespaces the container is (or isn't) joined to it can be necessary to specify the appropriate flags manually after consulting the nsenter manpage:

nsenter -t 797 -{m,u,i,n,p,U,C} -a /bin/sh -l

All in all, this can be turned into a nice alias for your .bashrc:

function center ()
{
  [ -z "$1" ] && { echo "Usage: center <name>" >&2; return 1; }
  pid=$(sed -n 2p "/sys/fs/cgroup/pids/machine.slice/systemd-nspawn@$1.service/tasks")
  [ -z "$pid" ] && { echo "Container not running" >&2; return 1; }
  nsenter -t $pid -a /bin/sh -l
}
1

http://man7.org/linux/man-pages/man7/namespaces.7.html

QEMU Configuration & Usage

This will cover some QEMU options I have found useful beyond the basics.

Machine Type (x86)

-M q35 configures a more modern chipset to be emulated. The Q35 chipset supports PCI-e and includes an AHCI controller 1.

UEFI
Get UEFI support by replacing the bios using -bios ./OVMF-pure-efi.fd or OVMF-with-csm.fd if legacy boot is desired.
OVMF can be downloaded from https://www.kraxel.org/repos/jenkins/edk2/ (pick edk2.git-ovmf-x64-...). bsdtar can extract the rpms.
Many distributions also offer a matching ovmf package in their repos.

If you need UEFI settings (such as display resolution) to be persisted, a copy of OVMF_VARS (one per VM) needs to be provided too:

-drive file=./OVMF_CODE-pure-efi.fd,format=raw,if=pflash,readonly=on \
-drive file=./OVMF_VARS-pure-efi.fd,format=raw,if=pflash
Attaching disk images using VirtIO

Use -drive file=disk.img,if=virtio for improved disk performance. Windows guests require additional drivers 2 to use this.

Disabling disk cache flush
If you are installing a VM to quickly test something, disabling flushing of write cache to disk can speed up the process immensely:
-drive file=foobar.qcow2,if=virtio,cache=unsafe (or cache.no-flush=on)

Caution: Never use this in a production environment or with any VMs or data you care about. The guest OS will be tricked into believing it has safely written data to disk, when in reality it could be lost at any moment.

Attaching raw disks
-drive file=/dev/sdb,if=virtio,format=raw,cache=none

When attaching entire disks, partitions or logical volumes cache=none is a good idea.

Share host directory to guest

-drive file=fat:/path/to/dir,snapshot=on creates a read-only virtual FAT-formatted disk image from the given directory.

Multiple CD-ROM images
-drive file=X.iso,index=0,media=cdrom -drive file=Y.iso,index=1,media=cdrom

The index=N parameter is optional but can be used to explicitly order drives.

Bridged Network Adapter
-netdev bridge,br=br0,id=mynet -device virtio-net-pci,netdev=mynet
short syntax: -nic bridge,br=br0,model=virtio
For virtio, Windows needs additional drivers 2.
Aside from virtio-net-pci QEMU also supports emulating real cards such as:

e1000e (Intel 82574L GbE) which is the default, e1000 (Intel 82540EM GbE) or rtl8139 (Realtek RTL-8139C 10/100M)

CPU type
The default is -cpu qemu64.
To get the full CPU feature set in the guest use -cpu host or the appropriate family, e.g. -cpu Haswell.
Alternatively, flags can also be enabled individually: -cpu qemu64,+ssse3,+sse4.2,+avx,+avx2

-cpu kvm64 is legacy and should not be used 3.

VNC

-display vnc=localhost:1,lossy=on starts VNC on port 5901 (no authentication, but localhost only) with JPEG compression enabled to save bandwidth.

USB Input Devices
-usb -device usb-tablet -device usb-kbd attaches attaches keyboard and tablet (as mouse) via USB instead of PS/2.

This improves mouse support especially when using VNC and makes grabbing unnecessary in the GUI.

Port forwarding with User networking
When using -nic user (default) the hostfwd=PROTO::HPORT-:PORT option can be used to forward connections to the guest.

e.g. -nic user,model=virtio,hostfwd=tcp::2222-:22

VGA driver
-vga qxl offers improved performance over the default (std). Windows needs drivers, again 2.

3D acceleration for Linux guests is possible with -vga virtio 4.

Serial console
-serial pty connects the serial port to a PTY, which can then be interacted with using screen.
Alternatively when -nographic is used, the QEMU monitor and serial get multiplexed to stdio.

Ctrl-A c can then be used to switch between the monitor/serial 5.

Monitor console
With either -nographic or -monitor stdio, QEMU will make its monitor console available in the terminal.
It can also be accessed when using the GUI via "View" > "compatmonitor0".

The monitor console provides access to lots of functionality including snapshots, migration, device hotplug and debugging.

  • stop pauses the VM, cont resumes execution again

  • info registers shows the current values of CPU registers

  • x/x ADDRESS prints the hexadecimal value at the given address in guest memory

  • x/20i ADDRESS disassembles 20 instructions at the given address

Detailed explanations can be found in the relevant documentation.

Emulated SCSI controller
(because it's possible, not because it's useful)

-device lsi,id=lsi -drive file=somewhere.img,if=none,id=disk0 -device scsi-hd,drive=disk0,bus=lsi.0