Go to file

hygienic-books 15d67f2e3f Merge pull request '26-install-zfs-only-when-needed' (#27 ) from 26-install-zfs-only-when-needed into main

Reviewed-on: #27

2025-01-17 20:11:11 +00:00

.gitignore

build(meta): Don't commit .idea JetBrains dir (#26 )

2025-01-17 21:05:53 +01:00

LICENSE

Initial commit

2023-02-19 06:20:56 +00:00

README.md

refactor(os): Update interactive questionnaire gif (#19 )

2023-11-12 03:31:02 +01:00

setup.sh

fix(zfs): Install ZFS only when needed (#26 )

2025-01-17 21:09:19 +01:00

README.md

arch-zbm

Helper script to install Arch Linux with ZFSBootMenu from within a running Arch Linux live CD ISO image

Prep

We expect minimal prep on your end. Please make sure that before execution the following conditions are met.

UEFI

On a UEFI system ensure these conditions are met. See How to prep for details on how to meet these conditions.

One GPT-partitioned disk
Arch Linux live CD ISO image sees exactly one partition with partition type code BF00 ("Solaris root")
Arch Linux live CD ISO image sees exactly one partition with partition type code EF00 ("EFI System Partition")
The EF00 EFI partition is mountable, in practical terms this usually only means it has a file system.
No ZFS zpool exists

Legacy BIOS

If you are instead running a legacy BIOS machine ensure these conditions are met. See How to prep for details on how to meet these conditions.

One MBR-partitioned disk
Arch Linux live CD ISO image sees exactly one partition with partition type code bf ("Solaris root")
Arch Linux live CD ISO image sees exactly one partition with partition type code 83 ("Linux")
The 83 Linux partition is mountable, in practical terms this usually only means it has a file system.
No ZFS zpool exists

Neither with a UEFI nor legacy BIOS system are any of these conditions a requirement from ZFSBootMenu. We're just setting requirements to easily identify if you intend to do a UEFI or a legacy BIOS install. Subsequently the script has no logic to detect UEFI or legacy BIOS mode, that's legwork left to the reader :) The Internet seems to agree that a good quick check is to see if your Arch Linux live CD ISO image has directory /sys/firmware/efi.

[ -d /sys/firmware/efi ] && echo 'Likely a UEFI system' || echo 'Probably a legacy BIOS system'

If you're unsure nothing's stopping you from just giving it a go with a best guess and if that fails you know you guessed wrong.

How to prep

UEFI

On a blank example disk /dev/sda you can fulfill the UEFI requirements (one EF00 partition with a file system plus one BF00 partition) for example like so:

sgdisk --new '1::+512M' --new '2' --typecode '1:EF00' --typecode '2:BF00' /dev/sda
mkfs.vfat /dev/sda1

--new '1::+512M': Create partition number 1. The field separator : separates the partition number from start sector. In this case start sector is unspecified so start sector sits at whatever the system's default is for this operation. On a blank disk on an Arch Linux live CD ISO image this will default to sector 2048. Partition ends at whatever the beginning is +512M meaning plus 512 Mebibytes.

--new '2': Create partition number 2. Both field number 2, the start sector, and field number 3, the end sector, are unspecified, there's no field separator :. Field number 2 will be the first free sector - in this case right after partition 1 - and field number 3 will be end of disk. Thus partition 2 will fill the remaining free disk space.

--typecode '1:EF00': Partition 1 gets partition type code EF00, an EFI System Partition.

--typecode '2:BF00': Partition 2 gets partition type code BF00, a Solaris root partition.

The result will be something like this at which point you can start the setup.sh script, see How to run this? below for more details.

# lsblk --paths --output 'NAME,SIZE,FSTYPE,PARTTYPE,PARTTYPENAME,PTTYPE' /dev/sda
NAME         SIZE FSTYPE PARTTYPE                             PARTTYPENAME PTTYPE
/dev/sda      10G                                                          gpt
├─/dev/sda1  512M vfat   c12a7328-f81f-11d2-ba4b-00a0c93ec93b EFI System   gpt
└─/dev/sda2  9.5G        6a85cf4d-1dd2-11b2-99a6-080020736631 Solaris root gpt

Legacy BIOS

For a legacy BIOS machine you'll be using a Master Boot Record (MBR) on your disk.

printf -- '%s\n' 'label: dos' 'start=1MiB, size=512MiB, type=83, bootable' 'start=513MiB, size=+, type=bf' | sfdisk /dev/sda
mkfs.vfat /dev/sda1

label: dos: Create the following partition layout in a Master Boot Record.

start=1MiB, size=512MiB, type=83, bootable: Partition 1 begins 1 Mebibyte after disk start and is 512 Mebibyte in size. We're setting its bootable flag and setting partition type code 83 ("Linux").

start=513MiB, size=+, type=bf: Partition 2 begins right at the start of Mebibyte 513, this is the very next sector after the end of partition 1. It takes up the remaining disk space, we're assigning type code bf ("Solaris").

The result will be something like this at which point you can start the setup.sh script, see How to run this? below for more details.

# lsblk --paths --output 'NAME,SIZE,FSTYPE,PARTTYPE,PARTTYPENAME,PTTYPE' /dev/sda
NAME         SIZE FSTYPE PARTTYPE PARTTYPENAME PTTYPE
/dev/sda      10G                              dos
├─/dev/sda1  512M vfat   0x83     Linux        dos
└─/dev/sda2  9.5G        0xbf     Solaris      dos

Partition naming

Since this script works with UEFI and legacy BIOS mode we'll be addressing both disk layout schemes with umbrella terms for the rest of this document for better readability: "The zpool partition" will be GPT BF00 partition and MBR bf partition. You'll parse the text accordingly. "The boot partition" will be GPT EF00 partition as well as the MBR 83 partition.

ZFS dataset layout

The script will create a single ZFS zpool zpool on the zpool partition with dataset child zpool/root which itself has one child zpool/root/archlinux, that's where Arch Linux gets installed. Parallel to zpool/root it'll create zpool/data with a zpool/data/home child dataset that gets mounted at /home.

How to run this?

Boot an Arch Linux live CD ISO image
Run:
```
export SCRIPT_URL='https://quico.space/quico-os-setup/arch-zbm/raw/branch/main/setup.sh' && curl -s "${SCRIPT_URL}" | bash
```
During execution the script will call itself when it changes into its chroot, that's why we export SCRIPT_URL. Feel free to update "${SCRIPT_URL}" with whatever branch or revision you want to use from quico.space/quico-os-setup/arch-zbm. Typically .../branch/main/setup.sh as shown above is what you want.

Options

The following options can be given either by exporting them as shell variables prior to script execution or in a file named archzbm_settings.env that lives in your current working directory where you're about to execute the script. You can walk yourself through an interactive questionnaire that helps create a valid archzbm_settings.env file. Check out Command line setup help for details on the questionnaire.

If you instead want to define settings yourself with an archzbm_settings.env file its file format is identical to shell variable assignments of the form VAR=value or VAR='value'.

If ./archzbm_settings.env exists the script will source its content and export all variables for use in future steps.

In cases where a variable is both exported prior to script execution and specified in archzbm_settings.env the latter will override the former.

Known options are as follows.

Kernel downgrade

By default we install newest linux and linux-headers packages into a chroot. Once we're in that chroot we then install newest AUR zfs-dkms package. You may want to override linux and linux-headers versions to ensure you end up with a compatible mix between them and zfs-dkms.

For example:

export ARCHZBM_KERNEL_VER=6.5.9.arch2

In our chroot this will trigger execution of:

downgrade --ala-only 'linux=6.5.9.arch2' 'linux-headers=6.5.9.arch2' --ignore always

Where downgrade is the AUR downgrade package. This will downgrade linux and linux-headers and will add a setting to your /etc/pacman.conf:

[options]
IgnorePkg = linux linux-headers

Setting ARCHZBM_KERNEL_VER to an empty string '' or keeping it undefined are both valid and will retain newest versions instead of downgrading.

Compression

By default we create a zpool with ZFS property compression=on. If the lz4_compress pool feature is active this will by default enable compression=lz4. See man 7 zfsprops for example in ZFS 2.1.9 for details. See zpool get feature@lz4_compress <pool> to check this feature's status on your <pool>.

To get a zpool with uncompressed datasets export the shell variable ARCHZBM_ZFSPROPS_NO_COMPRESSION with any value prior to running this script. Literally any value works as long as you're not setting this to an empty string:

export ARCHZBM_ZFSPROPS_NO_COMPRESSION=yesplease

Encryption

By default we encrypt the zpool with ZFS property encryption=on. In ZFS 2.1.9 this defaults to encryption=aes-256-gcm.

To get a zpool with unencrypted datasets export the shell variable ARCHZBM_ZFSPROPS_NO_ENCRYPTION with any value prior to running this script:

export ARCHZBM_ZFSPROPS_NO_ENCRYPTION=yup

Passwords

By default both the zpool password and the account password for root are literally password. While you can certainly change these after initial system setup (see Password change) you can also optionally set passwords before script execution as follows:

ARCHZBM_ZPOOL_PASSWORD='a fancy password'
ARCHZBM_ROOT_PASSWORD='t0psecr3t!'

While the root password is allowed to be weak and chpasswd won't care do make sure to set a zpool password that meets ZFS' complexity rules. Per man 7 zfsprops section keyformat the only requirement is a length "between 8 and 512 bytes" (as in minimum 8 characters). If you pick a password that's too weak ZFS will reject zpool creation and very ungracefully derail the rest of this script. The script doesn't check what you're setting.

The script does create a second user named build but doesn't set a password on account creation. It's intended as a helper for system setup tasks such as sudo -u build paru -S <package> where an account password is irrelevant since root can always sudo whatever it wants. You will not be able to log in to the build account yourself although you certainly could set a password for it. Instead we suggest you create a proper user account for yourself. Your newly installed Arch Linux comes with an /etc/motd greeting that summarizes this as:

useradd --create-home --shell /bin/bash --user-group --groups wheel <user>
passwd <user>

Networking

By default the script configures plain ZFSBootMenu without networking nor an SSH server. If you're interested in SSH-ing into your ZFSBootMenu boot loader you're going to want to specify some of the following variables.

IP address

IPv6 addresses are untested. Script has been confirmed working with IPv4 addresses.

ARCHZBM_NET_CLIENT_IP=''
ARCHZBM_NET_SERVER_IP=''
ARCHZBM_NET_GATEWAY_IP=''
ARCHZBM_NET_NETMASK=''
ARCHZBM_NET_HOSTNAME=''
ARCHZBM_NET_DEVICE=''
ARCHZBM_NET_AUTOCONF=''

By default none of the variables are set to any value and no networking will be available in ZFSBootMenu. If you want networking as in an IP address bound to a network interface set at least one of these variables or one of the SSH variables listed further down. Setting one or more ARCHZBM_NET_* variables to an empty string is valid. If at least one variable is given either from this paragraph or from SSH we're assuming that you want networking. Unspecified values and values set to the empty string '' use defaults.

For networking we rely on the mkinitcpio-nfs-utils package with its net hook. Please refer to its initcpio-install-net script file for usage hints on above variables. The hook implements a subset of the ip Kernel Command Line argument.

Mapping between net hook field names and our shell variables is straightforward. Fields 8, 9 and 10 (DNS and NTP server addresses) from the official ip docs are unsupported in net hook. As such our hook has a total of 7 fields available for you to configure.

+-------------+------------------------+
|  net hook   |       This script      |
+-------------+------------------------+
| <client-ip> | ARCHZBM_NET_CLIENT_IP  |
| <server-ip> | ARCHZBM_NET_SERVER_IP  |
| <gw-ip>     | ARCHZBM_NET_GATEWAY_IP |
| <netmask>   | ARCHZBM_NET_NETMASK    |
| <hostname>  | ARCHZBM_NET_HOSTNAME   |
| <device>    | ARCHZBM_NET_DEVICE     |
| <autoconf>  | ARCHZBM_NET_AUTOCONF   |
+-------------+------------------------+

A valid example with a few fields populated may look like so:

ARCHZBM_NET_CLIENT_IP='10.10.10.2'
ARCHZBM_NET_GATEWAY_IP='10.10.10.1'
ARCHZBM_NET_NETMASK='255.255.255.0'
ARCHZBM_NET_DEVICE='eth0'
ARCHZBM_NET_AUTOCONF='none'

Note that in this example ARCHZBM_NET_SERVER_IP and ARCHZBM_NET_HOSTNAME are left unassigned.

It'll add the following ip= instruction to your Kernel Command Line:

ip=10.10.10.2::10.10.10.1:255.255.255.0::eth0:none

This is also valid and will configure eth0 via DHCP:

ARCHZBM_NET_DEVICE='eth0'
ARCHZBM_NET_AUTOCONF='dhcp'

In ZFSBootMenu the device names that go into ARCHZBM_NET_DEVICE are raw unchanged kernel device names such as eth0. If you're unsure which device name to use in your Arch Linux live CD ISO image check dmesg output. During boot typically a kernel module will first assign the raw kernel device name then later systemd will enforce Predictable Network Interface Names.

In dmesg | grep on a physical PC with an MSI B550-A Pro mainboard from 2020 that comes with one onboard Realtek RTL8111H network adapter governed by the Realtek RTL-8169 Gigabit Ethernet driver from the r8169 kernel module you will for example see:
# dmesg -T | grep eth
[time] r8169 0000:2a:00.0 eth0: RTL8168h/8111h, 04:7c:16:00:01:02, XID 541, IRQ 95
[time] r8169 0000:2a:00.0 eth0: jumbo features [frames: 9194 bytes, tx checksumming: ko]
[time] r8169 0000:2a:00.0 enp42s0: renamed from eth0
Notice how a Predictable Network Interface Name comes in on line 3. What you need here is the eth0 part.

SSH

If you want networking indicated by the fact that at least one of the ARCHZBM_NET_* variables is set or one of the ARCHZBM_SSH_* vars we assume that you want an SSH daemon as well. This comes in the form of a dropbear daemon with minimal configurability. Use the following variables to define Dropbear's behavior.

ARCHZBM_SSH_PORT='22'
ARCHZBM_SSH_KEEPALIVE_INTVL='1'
ARCHZBM_SSH_AUTH_KEYS=''

In ARCHZBM_SSH_PORT you specify Dropbear's listening port, this defaults to 22 if unconfigured or set to an empty string. With ARCHZBM_SSH_KEEPALIVE_INTVL you define at which interval Dropbear will send keepalive messages to an SSH client through the SSH connection. This defaults to 1 as in every 1 second a keepalive message is sent. Per man 8 dropbear a value of 0 disables Dropbear sending keepalive messages. We suggest to leave this on and to keep the interval short, see SSH in ZFSBootMenu for how to work with this.

Dropbear in this setup only supports key-based authentication, no password-based authentication. The value from ARCHZBM_SSH_AUTH_KEYS will be converted to a list of public SSH keys allowed to SSH into Dropbear as its default root user while ZFSBootMenu is running. The format of ARCHZBM_SSH_AUTH_KEYS is a single line where authorized_keys entries are split with double-commas:

ssh-rsa Eahajei8,,ssh-ed25519 kaeD0mas ...

This syntax crutch allows you to use the full range of Dropbear-supported authorized_keys stanzas, see man 8 dropbear for what's available. Whether or not this is useful to you is another topic :) At least the functionality for stanzas is there by separating values in ARCHZBM_SSH_AUTH_KEYS with double-commas.

Command line setup help

An interactive questionnaire can guide you through settings and goes like this:

To do the questionnaire yourself start this script with the setup argument:

export SCRIPT_URL='https://quico.space/quico-os-setup/arch-zbm/raw/branch/main/setup.sh' && curl -s "${SCRIPT_URL}" | bash -s -- setup

When done rerun it without that argument:

export SCRIPT_URL='https://quico.space/quico-os-setup/arch-zbm/raw/branch/main/setup.sh' && curl -s "${SCRIPT_URL}" | bash

Steps

The script takes the following installation steps.

Install ZFS tools and kernel module with github.com/eoli3n/archiso-zfs
Create one ZFS zpool on top of zpool partition, encrypted and compressed datasets, password password
1. See paragraph Passwords to predefine your own passwords in a settings file
2. See paragraphs Compression/Encryption to optionally disable properties
Create dataset for Arch Linux and /home
Install Arch Linux into pool
Add ZFSBootMenu to boot partition
Configure boot method
- Either an EFI image with EFI boot order entries on a UEFI machine
- Or Syslinux with extlinux for a legacy BIOS computer
If requested by user enable SSH in ZFSBootMenu. We then also add:
- quico.space/quico-os-setup/mkinitcpio-dropbear-pacman-hook
Add pacman hooks to keep ZFSBootMenu images (and extlinux) updated
- quico.space/quico-os-setup/zbm-regen-pacman-hook
- quico.space/quico-os-setup/zbm-syslinux-pacman-hook
Exit into Arch Linux live CD ISO image shell for you to reboot and frolick

SSH in ZFSBootMenu

Per SSH and Networking this script will optionally add a Dropbear SSH daemon to ZFSBootMenu. While the mechanism of SSH-ing into a server isn't particularly noteworthy we humbly suggest that in this particular use case you let your SSH client listen for keepalive messages from the server.

ssh -o ServerAliveInterval=3 -o ServerAliveCountMax=0 root@<addr> -p <port>

A typical workflow with Dropbear is for you to SSH into it, issue zfs or zfsbootmenu commands and allow the Arch Linux boot process to commence. As soon as you're done Dropbear will terminate as ZFSBootMenu hands control off to your operating system's kernel. Without your client listening to keepalive messages it may not realize that the connection's gone for quite some time until you harshly interrupt it.

The server defaults to sending keepalive messages to your client every second.

With -o ServerAliveInterval=3 you instruct your client to send an are-your-still-there message to the server if your client ever stops getting keepalive messages from the server for 3 seconds. The server defaults to sending 1 keepalive ping per second so even on a somewhat lossy connection we can reasonably expect to get one message through to us within 3 seconds.

When it comes to the point that your SSH client sends an are-your-still-there message it expects a near-realtime response. It will accept -o ServerAliveCountMax=0 failures from the server to comply.

This effectively configures your SSH client to remain connected even through somewhat lossy hops to the Dropbear daemon; and to cleanly disconnect 3 seconds and some change after you've executed whatever you needed to do in ZFSBootMenu.

Kernel selection

This script compiles ZFS via Arch Linux' Dynamic Kernel Module Support (DKMS). Not all kernels allow for successful compilation, in some instances a particularly recent kernel version may change APIs to such a degree that ZFS compilation simply fails.

We strongly suggest to that you:

Firstly, refer to a resource such as the Arch Linux Archive package version list to find out what newest kernel version this script will install.
Secondly, research if newest AUR zfs-dkms package is compatible with that kernel. Two reasonable points of contact are AUR and its comments section for zfs-dkms where users quickly report issues; and the github.com/openzfs/zfs issues list.

An example for this is that linux-6.6.1.arch1-1-x86_64 came out on Wednesday, November 8, 2023 at a time when newest zfs-dkms package version was 2.2.0 which did not compile against linux 6.6.x.

You'd then set for example:

export ARCHZBM_KERNEL_VER=6.5.9.arch2

Where any 6.5.x version is known to work well with zfs-dkms. See also Kernel downgrade for details on how to configure this.

Flavor choices

We make the following opinionated flavor choices. Feel free to change them to your liking.

Arch Linux locale is set to en_US.UTF-8
Keymap is de-latin1
- Consult /etc/vconsole.conf
- Change zfs set org.zfsbootmenu:commandline=...
No X.Org Server, Wayland compositors or other GUI elements get installed
Timezone is Etc/UTC
- Check timedatectl set-timezone <tzdata-zone>

Post-run manual steps

After installation you're going to want to at least touch these points in your new Arch Linux install:

Package manager hook: pacman does not have a hook to do ZFS snapshots
- See quico.space/quico-os-setup/zfs-pacman-hook for an example you may want to install
Hostname: Installation chose a pseudo-randomly generated 8-character string with pwgen
- Check hostnamectl set-hostname <hostname>
Unprivileged user accounts: The OS was installed with root and unprivileged build users
Unless you had a settings file or exported shell env vars per Passwords you're going to want to change passwords now:
- ZFS: The password for all datasets underneath zpool is password.
- Local root account: The local root account's password is password.
Arch User Repository (AUR) helper: We installed paru as our AUR helper, we installed from AUR as paru-bin.
In /etc/systemd/network/50-wired.network instead of a DHCP-based network config you can get a static one. The DHCP-based one for reference looks like:
```
...

[Network]
DHCP=ipv4
IPForward=yes
Domains=~.

[DHCP]
UseDNS=yes
RouteMetric=10
```
A static config does away with the [DHCP] section:
```
...

[Network]
Address=10.10.10.2/24
Gateway=10.10.10.1
DNS=10.10.10.1
IPForward=yes
Domains=~.
```
In case you later want a graphical interface and specifically NetworkManager (via package networkmanager) consider telling it to keep its hands off of some of your network interfaces. The bullet point above adds a systemd-style config file that systemd-networkd.service will read and use. Should you ever install NetworkManager it will by default assume that it must manage all interfaces. It'll use its own DHCP client to try and get IP addresses for managed interfaces in which case you'll end up with whatever addressing scheme you configured in a .network unit file plus NetworkManager's additional address. Create /etc/NetworkManager/conf.d/99-unmanaged-devices.conf for example to declare some interfaces as off-limits or unmanaged:
```
[keyfile]
unmanaged-devices=mac:52:54:00:74:79:56;type:ethernet
```
Check out ArchWiki article "NetworkManager" section "Ignore specific devices" for more info.

Password change

After installation you're going to want to change your ZFS encryption password (unless you preconfigured a good zpool password in a settings file per Passwords). At any rate you still want to be familiar with the process and its caveat in case you ever need a zpool password change or want to do one now.

Steps

In a running OS:

Change password in keylocation file, e.g. /etc/zfs/zpool.key or whatever other "${zpool_name}"'.key' file you used during setup
Set this key as the new encryption key:
```
zfs change-key -l zpool
```
Quoting man 8 zfs-change-key from zfs-utils version 2.1.9 for the -l argument: "Ensures the key is loaded before attempting to change the key." When successful the command will not output data, it'll just silently change your encryption key.
Rebuild initramfs:
```
mkinitcpio -P
```
Here for example with -P (--allpresets) which processes all presets contained in /etc/mkinitcpio.d. This step puts the changed key file into your initramfs. During setup we've adjusted /etc/mkinitcpio.conf so that it contains FILES=(/etc/zfs/zpool.key) which causes the file to be added to initramfs as-is.

Boot flow

With your password changed in two locations (key file and initramfs) The boot process works as follows.

At boot time ZFSBootMenu will scan all pools that it can import for a bootfs property. If it only finds one pool with that property the dataset given as bootfs will be selected for boot with a 10-second countdown allowing manual interaction. With bootfs set ZFSBootMenu will not actively search through datasets for valid kernel and initramfs combinations, it'll instead accept bootfs as the default boot entry without us entering the pool decryption passphrase.

Upon loading into a given dataset ZFSBootMenu will attempt to auto-load the matching decryption key. In our setup this will fail because we purposely stored the encryption key inside our zpool/root/archlinux dataset. ZFSBootMenu will prompt us to type in the decryption key.

Lastly ZFSBootMenu loads our OS' kernel and initramfs combination via kexec. For this step we don't need to enter the decryption key again. Our initramfs file contains the plain-text /etc/zfs/zpool.key file which allows it to seamlessly import the right dataset, load its key and mount it.

Caveats in a password change

ZFS differentiates between user keys - also called wrapping keys - and the master key for any given encryption root. You never interact with the master key, you only pick your personal user key. Subsequently a user key change (in our use case we perceive this simply as a password change) has zero effect on data that's already encrypted. The operation is instant and merely reencrypts the already existing master key, the so-called wrapped master key.

ZFS generates the master key exactly once when you enable encryption on a dataset - technically when it becomes an encryption root. Among other inputs it uses your user key to encrypt (to wrap) the master key. When you change your user key it just means that the master key stays exactly the same and only the encrypted (wrapped) key changes.

man 8 zfs-change-key from zfs-utils version 2.1.9 adds:

If the user's key is compromised, zfs change-key does not necessarily protect existing or newly-written data from attack. Newly-written data will continue to be encrypted with the same master key as the existing data. The master key is compromised if an attacker obtains a user key and the corresponding wrapped master key. Currently, zfs change-key does not overwrite the previous wrapped master key on disk, so it is accessible via forensic analysis for an indeterminate length of time.

In the event of a master key compromise, ideally the drives should be securely erased to remove all the old data (which is readable using the compromised master key), a new pool created, and the data copied back. This can be approximated in place by creating new datasets, copying the data (e.g. using zfs send | zfs recv), and then clearing the free space with zpool trim --secure if supported by your hardware, otherwise zpool initialize.

On one hand changing the ZFS encryption password is generally a good and useful thing to do. On the other hand changing your password does not currently overwrite previous wrapped master keys on disk. A sufficiently motivated party that gains access to a wrapped master key and the matching user key is able to decrypt the master key and use it to read all data encrypted with it.

By extension this means after a password change your data remains at risk until you've copied it to a new dataset and erased previously used space thereby erasing any previous wrapped master keys.

Changing master key

In order to generate a new master key after you've changed your user key as mentioned in man 8 zfs-change-key from zfs-utils version 2.1.9 one example workflow goes like this:

Change user key
- Update /etc/zfs/zpool.key
- Update zpool with new key via zfs change-key -l zpool
- Generate new initramfs with mkinitcpio -P

Create a snapshot from current system dataset

# Assuming current system dataset is zpool/root/archlinux-sxu
# where '-sxu' is a random suffix to differentiate datasets
# and has no real meaning
zfs snapshot zpool/root/archlinux-sxu@rekey

Within same pool send/receive snapshot
```
zfs send \
    --large-block \
    --compressed \
    'zpool/root/archlinux-sxu@rekey' | \
\
zfs receive \
    -Fvu \
    -o 'encryption=on' \
    -o 'keyformat=passphrase' \
    -o 'keylocation=file:///etc/zfs/zpool.key' \
    -o 'mountpoint=/' \
    -o 'canmount=noauto' \
    -o 'org.zfsbootmenu:commandline=rw nowatchdog rd.vconsole.keymap=de-latin1' \
    'zpool/root/archlinux-frn'
```
Explanation:
- We specifically don't zfs send -R (--replicate). While it would normally be nice to transfer all of a dataset's children at once such as all of its snapshots the -R argument conflicts with the encryption property. See comment by Tom Caputi on GitHub openzfs/zfs issue 10507 from June 2020 for details. Basically if encryption is set then -R doesn't work. We could transfer existing encryption properties with -w/--raw but we don't actually want to transfer encryption properties at all. We want them to change during transfer, see the bullet point four points down from here talking about encryption.
- We zfs receive -F destroying any target snapshots and file systems beyond the snapshot we're transferring. In this example the target zpool/root/archlinux-frn doesn't even exist so -F isn't necessary to clean anything up. It's just good practice.
- With -v we get verbose progress output
- Argument -u makes sure the dataset does not get mounted after transfer. ZFS would mount it into / which wouldn't be helpful since we're currently using that filesystem ourselves.
- We set encryption properties keyformat, keylocation and most importantly encryption. The latter will turn our transferred dataset into its own encryptionroot which in turn generates a new master key. The auto-generated new master key gets wrapped with our updated passphrase in keylocation. This basically reencrypts all data in this dataset during transfer.
- We set mountpoint and canmount as well as an org.zfsbootmenu:commandline as we would for any new system dataset.
Change zpool's bootfs property to new system dataset
```
zpool set bootfs=zpool/root/archlinux-frn zpool
```
Boot into new system dataset
After reboot and now that you're in the new system dataset change its encryptionroot by letting it inherit data from its parent:
```
zfs change-key -i -l zpool/root/archlinux-frn
```
The parent zpool/root is inheriting this property from zpool which will make sure that zpool/root/archlinux-frn essentially gets its key now from zpool. Both zpool/root/archlinux-frn and zpool use the same exact keylocation with identical content. This operation is instant.

Finishing touches

Confirm master key change

Just to confirm that the master key has changed run this command. It takes a moment to output data:

zfs send --raw zpool/root/archlinux-frn@rekey | zstream dump | sed -n -e '/crypt_keydata/,/end crypt/p; /END/q'

Repeat for source dataset zpool/root/archlinux-sxu@rekey. You're particularly interested in parameters DSL_CRYPTO_MASTER_KEY_1 and the initialization vector DSL_CRYPTO_IV. Notice that they differ between old and new dataset confirming that your new dataset has a new master key.

Clean-up

Clean up:

In newly keyed/reencrypted system dataset destroy its snapshot
```
zfs destroy zpool/root/archlinux-frn@rekey
```

Recursively destroy source dataset

zfs destroy -r zpool/root/archlinux-sxu

Unmap/TRIM

Next up unmap/TRIM unallocated disk areas. If your zpool runs on an entire disk and not just on a partition, and if your disk supports TRIM you're going to want to do:

zpool trim --secure zpool

The next best alternative is to instead do:

zpool initialize zpool

View status with either one of:

# With TRIM status
zpool status -t zpool

# Without TRIM status
zpool status zpool

ZFS setup explained

Overview

The ZFS pool and dataset setup that makes this tick, explained in plain English.

Create zpool with options:
1. -R /mnt (aka -o cachefile=none -o altroot=/mnt). The pool is never cached, i.e. it's considered temporary. All pool and dataset mount paths have /mnt prepended. From man zpoolprops:
  
  This can be used when examining an unknown pool where the mount points cannot be trusted, or in an alternate boot environment, where the typical paths are not valid. altroot is not a persistent property. It is valid only while the system is up.
2. -O canmount=off: Note the capital -O which makes this a file system property, not a pool property. File system cannot be mounted, and is ignored by zfs mount -a. This property is not inherited.
3. -O mountpoint=none: What it says on the tin, the pool has no mountpoint configured.
4. -O encryption=on: Makes this our encryptionroot and passes the encryption setting to all child datasets. Selecting encryption=on when creating a dataset indicates that the default encryption suite will be selected, which is currently aes-256-gcm.
5. -O keylocation=file://...: This property is only set for encrypted datasets which are encryption roots. Controls where the user's encryption key will be loaded from by default for commands such as zfs load-key.
6. -O keyformat=passphrase: Controls what format the user's encryption key will be provided as. Passphrases must be between 8 and 512 bytes long.
At this time the newly created zpool is not mounted anywhere. Next we create the "root" dataset, that's an arbitary term for the parent dataset of all boot environments. Boot environments in your case may be for example different operating systems all of which live on separate datasets underneath the root.
1. -o canmount=off: Same as above, the root dataset can - just like the pool - not be mounted.
2. -o mountpoint=none: Same as above, the root dataset has - just like the pool - no mountpoint configured.
3. zfs set org.zfsbootmenu:commandline=...: Set a common kernel command line for all boot environments such as "ro quiet".
Neither the root dataset nor the pool are mounted at this time. We now create one boot environment dataset where we want to install Arch Linux.
1. -o mountpoint=/: Our Arch Linux dataset will be mounted at /.
2. -o canmount=noauto: When set to noauto, a dataset can only be mounted and unmounted explicitly. The dataset is not mounted automatically when the dataset is created or imported, nor is it mounted by the zfs mount -a command or unmounted by the zfs unmount -a command.
3. We then zpool set bootfs="zpool/root/archlinux" zpool: ZFSBootMenu uses the bootfs property to identify suitable boot environments. If only one pool has it - as is the case here - it identifies the pool's preferred boot dataset that will be booted with a 10-second countdown allowing manual interaction in ZFSBootMenu.
4. We explicitly mount the boot environment. Since the entire pool is still subject to our initial -R /mnt during creation a zfs mount zpool/root/archlinux will mount the Arch Linux dataset not into / but instead into /mnt.
We also create a data dataset that - at least for now - we use to store only our /home data.
1. For zpool/data:
  1. -o mountpoint=/: We use the mountpoint property here only for inheritance.
  2. -o canmount=off: The zpool/data dataset itself cannot actually be mounted.
2. For a zpool/data/home child dataset:
  1. We do not specify any properties. Since canmount cannot be inherited the parent's canmount=off does not apply, it instead defaults to canmount=on. The parent's mountpoint=/ property on the other hand is inherited so for a home child dataset it conveniently equals mountpoint=/home.
  2. In effect this zpool/data/home dataset is subject to zfs mount -a and will happily automount into /home.
We export the zpool once, we then reimport it by scanning only inside /dev/disk/by-partuuid, again setting -R /mnt as we did during pool creation a moment ago and we do not mount any file systems.
We zfs load-key <encryptionroot> which will load the key from keylocation after which the keystatus property for <encryptionroot> and all child datasets will change from unavailable to available.
We mount our Arch Linux boot environment dataset. It automatically gets prefixed with -R /mnt since that's how we imported the pool.
We zfs mount -a which automounts zpool/data/home into /home, which again gets auto-prepended by /mnt.
We lastly mount our EFI partition into /mnt/efi.
We instruct ZFS to save its pool configuration via zpool set cachefile=/etc/zfs/zpool.cache zpool.

The complete ZFS structure now exists and is mounted at /mnt ready for any pacstrap, debootstrap, dnf --installroot or other bootstrapping action.

Adding another boot environment-independent dataset

Assume that in addition to your /home data which lives on zpool/data/home you want another dataset that is exempt from Arch Linux snapshots.

Consider an example /opt/git directory where a bunch of Git repos are checked out on which you work. You don't want them to be snapshotted - and rolled back - when something goes sideways: they are decoupled from everything else that goes on on your machine so you can easily and safely have a static /opt/git directory available in all boot environments.

Move your current /opt/git data out of the way for a moment:

mv '/opt/git'{,'.bak'}

Create datasets

zfs create -o canmount=off zpool/data/opt
zfs create zpool/data/opt/git

Remember that the zpool/data dataset already exists and that it has both mountpoint=/ and canmount=off set. It is not and cannot be mounted itself, it instead conveniently anchors datasets at /. Since the canmount dataset property cannot be inherited and defaults to canmount=on we have to manually specify -o canmount=off. Our new zpool/data/opt should not automatically mount into /opt.

We then create the child dataset zpool/data/opt/git, it defaults to canmount=on thus immediately shows up at /opt/git.

Move data back into place and clean up temp directory

rsync -av --remove-source-files '/opt/git'{'.bak',}'/'
find '/opt/git.bak' -type d -empty -delete

An example zpool/data dataset may now look like so:

# zfs list -r -oname,mountpoint,canmount,mounted zpool/data
NAME                MOUNTPOINT       CANMOUNT  MOUNTED
zpool/data          /                off       no
zpool/data/home     /home            on        yes
zpool/data/opt      /opt             off       no
zpool/data/opt/git  /opt/git         on        yes

Nested environment-independent datasets

Caution

If you want a dedicated dataset for a directory that lives deeper in your file system tree than just /opt/git, for example like /var/lib/docker make sure to not recursively create this structure in a single zfs create command.

In Adding another boot environment-independent dataset above you can safely do:

zfs create -o canmount=off zpool/data/opt

Here zpool/data already exists, you're only creating one child dataset opt and you're setting -o canmount=off so that it never mounts into your /opt directory.

Now consider the same setup for /var/lib/docker. If you follow the exact same approach:

zfs create -o canmount=off zpool/data/var/lib

Docker will correctly report:

cannot create 'zpool/data/var/lib': parent does not exist

You might want to just create the parent then with -p argument:

zfs create -p -o canmount=off zpool/data/var/lib
           ~~

Note, however, that -o canmount=off only applies to lib dataset and that zpool/data/var has just been auto-mounted into /var:

# zfs list -r -oname,mountpoint,canmount,mounted zpool/data
NAME                MOUNTPOINT  CANMOUNT  MOUNTED
zpool/data          /           off       no
zpool/data/home     /home       on        yes
zpool/data/opt      /opt        off       no
zpool/data/opt/git  /opt/git    on        yes
zpool/data/var      /var        on        yes  <---
zpool/data/var/lib  /var/lib    off       no

Advice

Instead create nested parents in multiple steps where you set each one to -o canmount=off:

zfs create -o canmount=off zpool/data/var
zfs create -o canmount=off zpool/data/var/lib

Lastly create the dataset you want mounted:

zfs create zpool/data/var/lib/docker

Mounting zpool for maintenance

In case you want to mount your zpool on an external operating system such as an Arch Linux live CD ISO image do it like so:

zpool import zpool -d /dev/disk/by-partuuid -R /mnt -f -N
zfs load-key -L prompt zpool
zfs mount zpool/root/archlinux
zfs mount -a

# UEFI system ...
mount /dev/sda1 /mnt/efi

# ... or legacy BIOS system
mount /dev/sda1 /mnt/boot/syslinux

arch-chroot /mnt /bin/bash

When done exit chroot and cleanly remove your work:

# UEFI system ...
umount /mnt/efi

# ... or legacy BIOS system
umount /mnt/boot/syslinux

zfs umount -a
zpool export zpool

Explanation:

We always want to mount pools by-partuuid for consistency so we specifically only look for pools at /dev/disk/by-partuuid.
We mount our zpool with -R /mnt (aka -o cachefile=none -o altroot=/mnt). The pool is never cached, i.e. it's considered temporary. All pool and dataset mount paths have /mnt prepended. From man zpoolprops:

This can be used when examining an unknown pool where the mount points cannot be trusted, or in an alternate boot environment, where the typical paths are not valid. altroot is not a persistent property. It is valid only while the system is up.

With -f and -N we force-mount our pool (-f) even if it previously wasn't cleanly exported; and we do not auto-mount any of its datasets (-N), not even the ones that have canmount=on set.

# zfs list -oname,mountpoint,canmount,mounted
NAME                  MOUNTPOINT  CANMOUNT  MOUNTED
zpool                 none        off       no
zpool/data            /mnt        off       no
zpool/data/home       /mnt/home   on        no <-- Not immediately mounted
zpool/root            none        off       no
zpool/root/archlinux  /mnt        noauto    no <-- Not immediately mounted

We load the decryption key by temporarily overriding the keylocation property to -L prompt. The default value is file:///etc/zfs/zpool.key which in all likelihood doesn't exist in this environment.

We mount our desired boot environment with zfs mount zpool/root/archlinux

# zfs list -oname,mountpoint,canmount,mounted
NAME                  MOUNTPOINT  CANMOUNT  MOUNTED
zpool                 none        off       no
zpool/data            /mnt        off       no
zpool/data/home       /mnt/home   on        no
zpool/root            none        off       no
zpool/root/archlinux  /mnt        noauto    yes <-- Only boot env now mounted

We mount all child datasets with zfs mount -a making /mnt/home available as well as any others you may have created yourself.

# zfs list -oname,mountpoint,canmount,mounted
NAME                  MOUNTPOINT  CANMOUNT  MOUNTED
zpool                 none        off       no
zpool/data            /mnt        off       no
zpool/data/home       /mnt/home   on        yes <-- Now mounted
zpool/root            none        off       no
zpool/root/archlinux  /mnt        noauto    yes <-- Now mounted

We lastly mount our EFI System Partition (ESP), in this example it's living at /dev/sda1 so adjust this path accordingly.

# df -hTP
Filesystem           Type      Size  Used Avail Use% Mounted on
...                  ...       ...    ...   ...  ... ...
zpool/root/archlinux zfs       8.6G  2.5G  6.2G  29% /mnt
zpool/data/home      zfs       6.3G  161M  6.2G   3% /mnt/home
/dev/sda1            vfat      511M   31M  481M   6% /mnt/efi

We're ready to arch-chroot into our boot environment.

Development

Conventional commits

This project uses Conventional Commits for its commit messages.

Commit types

Commit types besides fix and feat are:

build: Project structure, directory layout, build instructions for roll-out
refactor: Keeping functionality while streamlining or otherwise improving function flow
test: Working on test coverage
docs: Documentation for project or components

Commit scopes

The following scopes are known for this project. A Conventional Commits commit message may optionally use one of the following scopes or none:

iso: Changing Arch Linux live CD ISO image
zbm: Adjusting ZFSBootMenu's behavior
zfs: A change to how ZFS interacts with the system, either a pool or a dataset
os: Getting an operating system set up to correctly work in a ZFS boot environment
meta: Affects the project's repo layout, readme content, file names etc.

Credits

Most of what's here was shamelessly copied and slightly adapted for personal use from Jonathan Kirszling at GitHub.

Thanks to:

Jonathan Kirszling:
- github.com/eoli3n/arch-config/tree/master/scripts/zfs/install
- github.com/eoli3n/archiso-zfs
Maurizio Oliveri:
- github.com/Soulsuke/arch-zfs-tools
- gist.github.com/Soulsuke/6a7d1f09f7fef968a2f32e0ff32a5c4c
Zach Dykstra, Andrew J. Hesford and all other ZFSBootMenu contributors:
- Their ZFSBootMenu testing helper scripts (chroot-arch.sh, install-arch.sh)
github.com/kongkrit:
- gist.github.com/kongkrit/a0585e179e33c2adf92db4050ec5171d