zfs-pacman-hook
Arch Linux pacman hook for automatic ZFS snapshots
Setup
Get started like so:
- Install dependency
jq
- Clone repo into arbitrary path
<repo>
- Make
pacman-zfs-snapshot.sh
executablechmod +x <repo>/pacman-zfs-snapshot.sh
- Symlink to files, for example
Note that while you may choose arbitrary locations for symlinks thesudo ln -s <repo>/pacman-zfs-snapshot.sh /usr/local/bin/pacman-zfs-snapshot sudo ln -s <repo>/pacman-zfs-snapshot-install.hook /usr/share/libalpm/hooks/00-pacman-zfs-snapshot-install.hook sudo ln -s <repo>/pacman-zfs-snapshot-remove.hook /usr/share/libalpm/hooks/00-pacman-zfs-snapshot-remove.hook sudo ln -s <repo>/pacman-zfs-snapshot-upgrade.hook /usr/share/libalpm/hooks/00-pacman-zfs-snapshot-upgrade.hook sudo ln -s <repo>/pacman-zfs-snapshot.conf /etc/pacman-zfs-snapshot.conf
00-pacman-zfs-snapshot-*.hook
files reference/usr/local/bin/pacman-zfs-snapshot
. Change that accordingly if you need to. - For datasets you want auto-snapshotted add property
space.quico:auto-snapshot=true
With any other property and any other value datasets will not be auto-snapshotted.zfs set space.quico:auto-snapshot=true zpool/root/archlinux
- Adjust
pacman-zfs-snapshot.conf
to your liking. You may want to setdo_dry_run='true'
for a start and just reinstall a benign package to get a feel for what this hook would do.
What's it do?
In pacman
on every PreTransaction
, meaning right before any actual operation on a package begins, we trigger a ZFS snapshot. This happens via a so-called hook which is a plain text config file. Hook files make use of the Arch Linux Package Management (ALPM) library, also known as libalpm
for which pacman
is a frontend. By default hooks are stored in /usr/share/libalpm/hooks
. Additionally /etc/pacman.conf
has a directory configured as:
#HookDir = /etc/pacman.d/hooks/
Hook files from both directories are collectively parsed and executed in lexicographical order. Hook names from this repo begin with 00-*
so on a default Arch Linux they are the first to be executed during pacman
transactions.
For ZFS snapshots intended to save your bacon the 00-*
naming convention is particularly critical. In /usr/share/libalpm/hooks
you can see for example that when a kernel upgrade happens 60-mkinitcpio-remove.hook
is executed (deleting your existing vmlinuz-*
kernel image for example at /boot/vmlinuz-linux
). After that if you're using the zfs-dkms
package which itself requires dkms
which in turn installs 71-dkms-remove.hook
this hook removes your ZFS kernel module files. Both the 60-*
and optionally the 71-*
hook (for zfs-dkms
users) run early due to their naming. If we don't create a snapshot before these hooks run we end up creating a snapshot without kernel image and without ZFS kernel module files. Our 00-*
hook files are executed early enough ensuring that a snapshot can safely return you to a working system.
Snapshot selection
We snapshot datasets that have the space.quico:auto-snapshot
property set to true
. By default we further limit datasets to only those that are currently mounted in your active operating system. We identify these by asking findmnt
for a list of mounted file systems of fstype=="zfs"
which for example returns:
# findmnt --json --list --output 'fstype,source,target' | \
jq --raw-output '.[][] | select(.fstype=="zfs") | .source'
zpool/root/archlinux
If no dataset (or no local dataset) has the property set correctly no snapshots are done. The script will print an info-level message about that on pacman
transactions.
Snapshot chains
We retain two different snapshot chains, one for pacman
transactions that only affect what we are calling trivial packages and a separate chain for important packages. By default only the exact regular expression package name match ^(linux(-zen)?(-headers)?|systemd|zfs-(linux(-zen)?|dkms|utils))$
is considered important so in plain English any one of:
linux
linux-headers
linux-zen
linux-zen-headers
systemd
zfs-linux
zfs-linux-zen
zfs-dkms
zfs-utils
Whenever an important package is affected by a transaction a snapshot goes into the corresponding chain. In all other cases - when an important package is not affected - snapshots go into the trivial chain.
The trivial snapshot chain by default keeps 25 snapshots, the important chain keeps 10. The thought process here is that you will likely not futz around with a kernel every day whereas you may very well install arbitrary packages multiple times a day. Snapshots should keep you safe for a low number of weeks up to maybe a full month on an average daily driver system hence the defaults of 10 and 25 snapshots, respectively.
Dataset naming and uniqueness
Snapshots may look like so:
$ zfs list -o name -t all
NAME snap_date_format='%F-%H%M' ┌─── Important because systemd
zpool | | is on our list of
zpool/root ▼ ┌ Counter | important packages
zpool/root/archlinux ┌─────────────┐ ▼ ▼▼▼
zpool/root/archlinux@pacman_2023-03-07-0113_1_op:upgr_sev:imp_pkgs:systemd:bind:enchant:grep
zpool/root/archlinux@pacman_2023-03-07-0113_1_op:upgr_sev:trv_pkgs:jdk17-temurin
zpool/root/archlinux@pacman_2023-03-07-0114_1_op:inst_sev:trv_pkgs:docker-credential-secretser...
zpool/root/archlinux@pacman_2023-03-07-0115_1_op:upgr_sev:trv_pkgs:proton-ge-custom-bin
▲▲▲▲ ▲▲▲ └────────────────────────────┘
| | Max. 30 characters per our
Pacman operation that triggered this snapshot ───┘ | pacman-zfs-snapshot.conf
| setting 'pkgs_list_max_length'
Severity based on affected packages, here trivial ───────┘
Notice how in this case the counter is 1
for all four snapshots. The counter is used as the distinguishing factor for snapshots that are otherwise identical. This avoids naming collisions by incrementing it as needed. In day-to-day operations you will typically see it at 1
as there rarely is a need to avoid collisions unless you purposely limit the timestamp length and/or package list length to the point that successive snapshots may appear identical. See Avoiding naming collisions for more details.
Notice also how snapshot line 3 ends in docker-credential-secretser...
. This snapshot was triggered on installation of the Arch User Repository package docker-credential-secretservice-bin whose package name is 35 characters long. In this example our pkgs_list_max_length
setting limits maximum name of the packages string to 30
characters. If we can't naturally fit package names into this limit by removing packages from the list we instead cut off part of the package name and add an ellipsis (three dots ...
). The default setting is pkgs_list_max_length='30'
. In case the user wants three characters or fewer thus making an ellipsis impractical we simply trim the package name to that many characters:
pkgs_list_max_length='7': dock...
pkgs_list_max_length='6': doc...
pkgs_list_max_length='5': do...
pkgs_list_max_length='4': d...
pkgs_list_max_length='3': doc
pkgs_list_max_length='2': do
pkgs_list_max_length='1': d
With a package list allowance of 0 characters the entire pkgs
field is removed. Above example will then look like so:
$ zfs list -o name -t all
NAME
zpool
zpool/root
zpool/root/archlinux
zpool/root/archlinux@pacman_2023-03-07-0113_1_op:upgr_sev:imp
zpool/root/archlinux@pacman_2023-03-07-0113_1_op:upgr_sev:trv
zpool/root/archlinux@pacman_2023-03-07-0114_1_op:inst_sev:trv
zpool/root/archlinux@pacman_2023-03-07-0115_1_op:upgr_sev:trv
Whatever you set as your pkgs_list_max_length
is still just a best effort as it is subject to ZFS' internal maximum for dataset name length. This limit is currently 255 characters. For a snapshot the dataset name in front of the @
character plus everything else starting with the @
character til the end count against the limit. If you'd like e.g. 200 characters allocated to the package list chances are that you'll see fewer characters than that depending on how long your dataset names are on their own.
Special characters in package names
Arch Linux has no qualms with at (@
) characters and plus (+
) characters in package names but ZFS very much does take issue with those. Just a heads-up, when constructing a ZFS snapshot name we replace all @
characters in package names with one dot each (.
) and we replace all +
characters with one underscore each (_
).
A snapshot name that would appear like so:
$ zfs list -o name -t all
NAME
zpool
zpool/root
zpool/root/archlinux
zpool/root/archlinux@pacman_2023-03-07-0113_1_op:upgr_sev:trv_pkgs:jdk17-temurin:libc++
~~~~~~
We'll create like so instead:
$ zfs list -o name -t all
NAME
zpool
zpool/root
zpool/root/archlinux
zpool/root/archlinux@pacman_2023-03-07-0113_1_op:upgr_sev:trv_pkgs:jdk17-temurin:libc__
~~~~~~
Have a look at pacman-zfs-snapshot.conf
as well, its comments should be clear enough to get you going.
Avoiding naming collisions
By default snapshot names contain a timestamp formatted like so: 2023-03-07-0114
. This makes snapshot names reasonably unique. You can change both the timestamp format and timezone in pacman-zfs-snapshot.conf
where the format defaults to:
snap_date_format='%F-%H%M'
And the timezone defaults to:
snap_timezone='Etc/UTC'
With these settings it is possible to cause ZFS snapshot name collisions (meaning reuse of the exact same snapshot name) when all of the following conditions are true for any two pacman
operations:
- They occur within the same minute
- They cover the same type of operation (Install, Remove or Upgrade)
- They cover the same list of packages
The script safeguards against naming collisions by adding a monotonically incrementing counter after the timestamp string.
For example by running pacman -S tmux
three times within the same minute (once for an Install operation and two more times for two identical Upgrade operations) your system may generate the following example snapshots:
zpool/root/archlinux@pacman_2023-03-07-0116_1_op:inst_sev:trv_pkgs:tmux
zpool/root/archlinux@pacman_2023-03-07-0116_1_op:upgr_sev:trv_pkgs:tmux
zpool/root/archlinux@pacman_2023-03-07-0116_2_op:upgr_sev:trv_pkgs:tmux
~~~
Notice that lines 2 and 3 would collide since their dataset names are virtually identical other than the counter suffix which was incremented by 1 to avoid a collision.
This facilitates a hands-off approach to using this script on a daily driver system without risking missing snapshots or employing other more involved approaches to avoid naming collisions.
Rollback
After a rollback for example via the excellent ZFSBootMenu pacman
and all AUR helpers you may be using will consider the pacman
database to be locked. No pacman
transactions can start, you will for example see:
- In
pacman
# pacman -Syu :: Synchronizing package databases... error: failed to synchronize all databases (unable to lock database)
- In
paru
$ paru :: Pacman is currently in use, please wait...
The moment a snapshot was created pacman
was already in a transaction so it had already written its lock file to /var/lib/pacman/db.lck
. After a clean finish pacman
would have deleted that lock itself but since you rolled back to a point mid-transaction it's still there. Just delete the file and you're good to go:
sudo rm /var/lib/pacman/db.lck
Development
Conventional commits
This project uses Conventional Commits for its commit messages.
Commit types
Commit types besides fix
and feat
are:
build
: Project structure, directory layout, build instructions for roll-outrefactor
: Keeping functionality while streamlining or otherwise improving function flowtest
: Working on test coveragedocs
: Documentation for project or components
Commit scopes
The following scopes are known for this project. A Conventional Commits commit message may optionally use one of the following scopes or none:
conf
: How we deal with script configscript
: Any other script work that doesn't specifically fall into the above scopeshook
: Configuring the hook(s)meta
: Affects the project's repo layout, readme content, file names etc.