zfs-pacman-hook/README.md

7.3 KiB

zfs-pacman-hook

Arch Linux pacman hook for automatic ZFS snapshots

Setup

Get started like so:

  1. Install dependency jq
  2. Clone repo into arbitrary path <repo>
  3. Make pacman-zfs-snapshot.sh executable
    chmod +x <repo>/pacman-zfs-snapshot.sh
    
  4. Symlink to files, for example
    sudo ln -s <repo>/pacman-zfs-snapshot.sh /usr/local/bin/pacman-zfs-snapshot
    sudo ln -s <repo>/pacman-zfs-snapshot-install.hook /usr/share/libalpm/hooks/00-pacman-zfs-snapshot-install.hook
    sudo ln -s <repo>/pacman-zfs-snapshot-remove.hook /usr/share/libalpm/hooks/00-pacman-zfs-snapshot-remove.hook
    sudo ln -s <repo>/pacman-zfs-snapshot-upgrade.hook /usr/share/libalpm/hooks/00-pacman-zfs-snapshot-upgrade.hook
    sudo ln -s <repo>/pacman-zfs-snapshot.conf /etc/pacman-zfs-snapshot.conf
    
    Note that while you may choose arbitrary locations for symlinks the 00-pacman-zfs-snapshot-*.hook files reference /usr/local/bin/pacman-zfs-snapshot. Change that accordingly if you need to.
  5. For datasets you want auto-snapshotted add property space.quico:auto-snapshot=true
    zfs set space.quico:auto-snapshot=true zpool/root/archlinux
    
    With any other property and any other value datasets will not be auto-snapshotted.
  6. Adjust pacman-zfs-snapshot.conf to your liking. You may want to set do_dry_run='true' for a start and just reinstall a benign package to get a feel for what this hook would do.

What's it do?

In pacman on every PreTransaction, meaning right before any actual operation on a package begins, we trigger a ZFS snapshot. This happens via a so-called hook which is a plain text config file. Hook files make use of the Arch Linux Package Management (ALPM) library, also known as libalpm for which pacman is a frontend. By default hooks are stored in /usr/share/libalpm/hooks. Additionally /etc/pacman.conf has a directory configured as:

#HookDir = /etc/pacman.d/hooks/

Hook files from both directories are collectively parsed and executed in lexicographical order. Hook names from this repo begin with 00-* so on a default Arch Linux they are the first to be executed during pacman transactions.

For ZFS snapshots intended to save your bacon the 00-* naming convention is particularly critical. In /usr/share/libalpm/hooks you can see for example that when a kernel upgrade happens 60-mkinitcpio-remove.hook is executed (deleting your existing vmlinuz-* kernel image for example at /boot/vmlinuz-linux). After that if you're using the zfs-dkms package which itself requires dkms which in turn installs 71-dkms-remove.hook this hook removes your ZFS kernel module files. Both the 60-* and optionally the 71-* hook (for zfs-dkms users) run early due to their naming. If we don't create a snapshot before these hooks run we end up creating a snapshot without kernel image and without ZFS kernel module files. Our 00-* hook files are executed early enough ensuring that a snapshot can safely return you to a working system.

By default we identify the active system dataset by doing findmnt / --noheadings --output source which for example returns:

zpool/root/archlinux

If exactly one source returns that is the exact name of a ZFS dataset in an imported zpool we create a snapshot on it. If no source returns we silently exit. If more than one source returns we raise an error and halt the pacman transaction.

We retain two different snapshot chains, one for pacman transactions that only affect what we are calling trivial packages and a separate chain for important packages. By default only the exact regular expression package name match ^(linux|systemd|zfs-(dkms|utils))$ is considered important so in plain English any one of:

  • linux
  • systemd
  • zfs-dkms
  • zfs-utils

Whenever an important package is affected by a transaction a snapshot goes into the corresponding chain. In all other cases - when an important package is not affected - snapshots go into the trivial chain.

The trivial snapshot chain by default keeps 25 snapshots, the important chain keeps 5. The thought process here is that you will likely not futz around with a kernel every day whereas you may very well install arbitrary packages multiple times a day. Snapshots should keep you safe for a couple of days hence the defaults of 5 and 25 snapshots, respectively.

Snapshots may look like so:

$ zfs list -o name -t all
NAME                                                     ┌─── Important because systemd
zpool                 snap_date_format='%F-%H%M'         |    is on our list of
zpool/root                         ▼                     |    important packages
zpool/root/archlinux        ┌─────────────┐             ▼▼▼
zpool/root/archlinux@pacman_2023-03-07-0113_op:upgr_sev:imp_pkgs:systemd:bind:enchant:grep
zpool/root/archlinux@pacman_2023-03-07-0113_op:upgr_sev:trv_pkgs:jdk17-temurin
zpool/root/archlinux@pacman_2023-03-07-0115_op:upgr_sev:trv_pkgs:proton-ge-custom-bin
                                               ▲▲▲▲     ▲▲▲      └────────────────────────────┘
                                                 |       |         Max. 30 characters per our
Pacman operation that triggered this snapshot ───┘       |          pacman-zfs-snapshot.conf
                                                         |       setting 'pkgs_list_max_length'
Severity based on affected packages, here trivial ───────┘

Have a look at pacman-zfs-snapshot.conf as well, its comments should be clear enough to get you going.

Rollback

After a rollback for example via the excellent ZFSBootMenu pacman and all AUR helpers you may be using will consider the pacman database to be locked. No pacman transactions can start, you will for example see:

  • In pacman
    # pacman -Syu
    :: Synchronizing package databases...
    error: failed to synchronize all databases (unable to lock database)
    
  • In paru
    $ paru
    :: Pacman is currently in use, please wait...
    

The moment a snapshot was created pacman was already in a transaction so it had already written its lock file to /var/lib/pacman/db.lck. After a clean finish pacman would have deleted that lock itself but since you rolled back to a point mid-transaction it's still there. Just delete the file and you're good to go:

sudo rm /var/lib/pacman/db.lck

Development

Conventional commits

This project uses Conventional Commits for its commit messages.

Commit types

Commit types besides fix and feat are:

  • build: Project structure, directory layout, build instructions for roll-out
  • refactor: Keeping functionality while streamlining or otherwise improving function flow
  • test: Working on test coverage
  • docs: Documentation for project or components

Commit scopes

The following scopes are known for this project. A Conventional Commits commit message may optionally use one of the following scopes or none:

  • conf: How we deal with script config
  • script: Any other script work that doesn't specifically fall into the above scopes
  • hook: Configuring the hook(s)
  • meta: Affects the project's repo layout, readme content, file names etc.