206 lines
13 KiB
Markdown
206 lines
13 KiB
Markdown
# zfs-pacman-hook
|
|
|
|
Arch Linux pacman hook for automatic ZFS snapshots
|
|
|
|
# Setup
|
|
|
|
Get started like so:
|
|
|
|
1. Install dependency `jq`
|
|
1. Clone repo into arbitrary path `<repo>`
|
|
1. Make `pacman-zfs-snapshot.sh` executable
|
|
```
|
|
chmod +x <repo>/pacman-zfs-snapshot.sh
|
|
```
|
|
1. Symlink to files, for example
|
|
```
|
|
sudo ln -s <repo>/pacman-zfs-snapshot.sh /usr/local/bin/pacman-zfs-snapshot
|
|
sudo ln -s <repo>/pacman-zfs-snapshot-install.hook /usr/share/libalpm/hooks/00-pacman-zfs-snapshot-install.hook
|
|
sudo ln -s <repo>/pacman-zfs-snapshot-remove.hook /usr/share/libalpm/hooks/00-pacman-zfs-snapshot-remove.hook
|
|
sudo ln -s <repo>/pacman-zfs-snapshot-upgrade.hook /usr/share/libalpm/hooks/00-pacman-zfs-snapshot-upgrade.hook
|
|
sudo ln -s <repo>/pacman-zfs-snapshot.conf /etc/pacman-zfs-snapshot.conf
|
|
```
|
|
Note that while you may choose arbitrary locations for symlinks the `00-pacman-zfs-snapshot-*.hook` files reference `/usr/local/bin/pacman-zfs-snapshot`. Change that accordingly if you need to.
|
|
1. For datasets you want auto-snapshotted add property `space.quico:auto-snapshot=true`
|
|
```
|
|
zfs set space.quico:auto-snapshot=true zpool/root/archlinux
|
|
```
|
|
With any other property and any other value datasets will not be auto-snapshotted.
|
|
1. Adjust `pacman-zfs-snapshot.conf` to your liking. You may want to set `do_dry_run='true'` for a start and just reinstall a benign package to get a feel for what this hook would do.
|
|
|
|
# What's it do?
|
|
|
|
In `pacman` on every `PreTransaction`, meaning right before any actual operation on a package begins, we trigger a ZFS snapshot. This happens via a so-called hook which is a plain text config file. Hook files make use of the Arch Linux Package Management (ALPM) library, also known as `libalpm` for which `pacman` is a frontend. By default hooks are stored in `/usr/share/libalpm/hooks`. Additionally `/etc/pacman.conf` has a directory configured as:
|
|
```
|
|
#HookDir = /etc/pacman.d/hooks/
|
|
```
|
|
Hook files from both directories are collectively parsed and executed in lexicographical order. Hook names from _this_ repo begin with `00-*` so on a default Arch Linux they are the first to be executed during `pacman` transactions.
|
|
|
|
For ZFS snapshots intended to save your bacon the `00-*` naming convention is particularly critical. In `/usr/share/libalpm/hooks` you can see for example that when a kernel upgrade happens `60-mkinitcpio-remove.hook` is executed (deleting your existing `vmlinuz-*` kernel image for example at `/boot/vmlinuz-linux`). After that if you're using the `zfs-dkms` package which itself requires `dkms` which in turn installs `71-dkms-remove.hook` this hook removes your ZFS kernel module files. Both the `60-*` and optionally the `71-*` hook (for `zfs-dkms` users) run early due to their naming. If we don't create a snapshot before these hooks run we end up creating a snapshot without kernel image and without ZFS kernel module files. Our `00-*` hook files are executed early enough ensuring that a snapshot can safely return you to a working system.
|
|
|
|
By default we identify the active system dataset by doing `findmnt / --noheadings --output source` which for example returns:
|
|
```
|
|
zpool/root/archlinux
|
|
```
|
|
If exactly one source returns that is the exact name of a ZFS dataset in an imported zpool we create a snapshot on it. If no source returns we silently exit. If more than one source returns we raise an error and halt the `pacman` transaction.
|
|
|
|
We retain two different snapshot chains, one for `pacman` transactions that only affect what we are calling _trivial_ packages and a separate chain for _important_ packages. By default only the exact regular expression package name match `^(linux(-zen)?(-headers)?|systemd|zfs-(linux(-zen)?|dkms|utils))$` is considered important so in plain English any one of:
|
|
|
|
- `linux`
|
|
- `linux-headers`
|
|
- `linux-zen`
|
|
- `linux-zen-headers`
|
|
- `systemd`
|
|
- `zfs-linux`
|
|
- `zfs-linux-zen`
|
|
- `zfs-dkms`
|
|
- `zfs-utils`
|
|
|
|
Whenever an important package is affected by a transaction a snapshot goes into the corresponding chain. In all other cases - when an important package is not affected - snapshots go into the trivial chain.
|
|
|
|
The _trivial_ snapshot chain by default keeps 25 snapshots, the _important_ chain keeps 10. The thought process here is that you will likely not futz around with a kernel every day whereas you may very well install arbitrary packages multiple times a day. Snapshots should keep you safe for a low number of weeks up to maybe a full month on an average daily driver system hence the defaults of 10 and 25 snapshots, respectively.
|
|
|
|
Snapshots may look like so:
|
|
```
|
|
$ zfs list -o name -t all
|
|
NAME ┌─── Important because systemd
|
|
zpool snap_date_format='%F-%H%M' | is on our list of
|
|
zpool/root ▼ | important packages
|
|
zpool/root/archlinux ┌─────────────┐ ▼▼▼
|
|
zpool/root/archlinux@pacman_2023-03-07-0113_op:upgr_sev:imp_pkgs:systemd:bind:enchant:grep
|
|
zpool/root/archlinux@pacman_2023-03-07-0113_op:upgr_sev:trv_pkgs:jdk17-temurin
|
|
zpool/root/archlinux@pacman_2023-03-07-0114_op:inst_sev:trv_pkgs:docker-credential-secretser...
|
|
zpool/root/archlinux@pacman_2023-03-07-0115_op:upgr_sev:trv_pkgs:proton-ge-custom-bin
|
|
▲▲▲▲ ▲▲▲ └────────────────────────────┘
|
|
| | Max. 30 characters per our
|
|
Pacman operation that triggered this snapshot ───┘ | pacman-zfs-snapshot.conf
|
|
| setting 'pkgs_list_max_length'
|
|
Severity based on affected packages, here trivial ───────┘
|
|
```
|
|
|
|
Notice how snapshot line 3 ends in `docker-credential-secretser...`. This snapshot was triggered on installation of the Arch User Repository package [docker-credential-secretservice-bin](https://aur.archlinux.org/packages/docker-credential-secretservice-bin) whose package name is 35 characters long. In this example our `pkgs_list_max_length` setting limits maximum name of the packages string to `30` characters. If we can't naturally fit package names into this limit by removing packages from the list we instead cut off part of the package name and add an ellipsis (three dots `...`). The default setting is `pkgs_list_max_length='30'`. In case the user wants three characters or fewer thus making an ellipsis impractical we simply trim the package name to that many characters:
|
|
```
|
|
pkgs_list_max_length='7': dock...
|
|
pkgs_list_max_length='6': doc...
|
|
pkgs_list_max_length='5': do...
|
|
pkgs_list_max_length='4': d...
|
|
pkgs_list_max_length='3': doc
|
|
pkgs_list_max_length='2': do
|
|
pkgs_list_max_length='1': d
|
|
```
|
|
With a package list allowance of 0 characters the entire `pkgs` field is removed. Above example will then look like so:
|
|
```
|
|
$ zfs list -o name -t all
|
|
NAME
|
|
zpool
|
|
zpool/root
|
|
zpool/root/archlinux
|
|
zpool/root/archlinux@pacman_2023-03-07-0113_op:upgr_sev:imp
|
|
zpool/root/archlinux@pacman_2023-03-07-0113_op:upgr_sev:trv
|
|
zpool/root/archlinux@pacman_2023-03-07-0114_op:inst_sev:trv
|
|
zpool/root/archlinux@pacman_2023-03-07-0115_op:upgr_sev:trv
|
|
```
|
|
|
|
Whatever you set as your `pkgs_list_max_length` is still just a best effort as it is subject to ZFS' internal maximum for dataset name length. This limit is currently 255 characters. For a snapshot this is counted against the dataset name in front of the `@` character plus everything else starting with the `@` character til the end. If you'd like e.g. 200 characters allocated to the package list chances are that you'll see fewer characters than that depending on how long your dataset names are on their own.
|
|
|
|
Have a look at `pacman-zfs-snapshot.conf` as well, its comments should be clear enough to get you going.
|
|
|
|
# Avoiding naming collisions
|
|
|
|
By default snapshot names contain a timestamp formatted like so: `2023-03-07-0114`. This makes snapshot names reasonably unique. You can change both the timestamp format and timezone in `pacman-zfs-snapshot.conf` where the format defaults to:
|
|
```
|
|
snap_date_format='%F-%H%M'
|
|
```
|
|
And the timezone defaults to:
|
|
```
|
|
snap_timezone='Etc/UTC'
|
|
```
|
|
|
|
With these settings it is possible to cause ZFS snapshot name collisions (meaning reuse of the exact same snapshot name) when all of the following conditions are true for any two `pacman` operations:
|
|
- They occur within the same minute
|
|
- They cover the same type of operation (_Install_, _Remove_ or _Upgrade_)
|
|
- They cover the same list of packages
|
|
|
|
For example by running `pacman -S tmux` three times within the same minute (once for an _Install_ operation and two more times for two identical _Upgrade_ operations) your system may generate the following example snapshots:
|
|
```
|
|
zpool/root/archlinux@pacman_2023-03-07-0116_op:inst_sev:trv_pkgs:tmux 1.66M - 20.7G -
|
|
zpool/root/archlinux@pacman_2023-03-07-0116_op:upgr_sev:trv_pkgs:tmux 2.17M - 20.7G -
|
|
```
|
|
|
|
Notice that there is no third snapshot for the second identical _Upgrade_ operation as this script skipped snapshot creation.
|
|
|
|
The rationale is that you're doing the exact same operation twice or more. There's most likely no reasonable expectaion that your operating system enters a different state on successive `pacman` operations so there's no need to deal with multiple snapshots capturing the same state.
|
|
|
|
Your `pacman` command line output will show this like so:
|
|
```
|
|
:: Running pre-transaction hooks...
|
|
(1/1) Create ZFS snapshot(s)
|
|
[WARN] ZFS snapshot skipped (same operation exists at 2023-03-07-0116):
|
|
[WARN] zpool/root/archlinux@pacman_2023-03-07-0116_op:upgr_sev:trv_pkgs:tmux
|
|
[WARN] No ZFS snapshot left to do after accounting for identical operations at 2023-03-07-0116.
|
|
```
|
|
|
|
Note that this script will not blindly skip doing **_all_** snapshots in this situation. It will still happily create snapshots that don't cause naming collisions for example when affected snapshots were already deleted or when you're adding an additional dataset to the list of datasets you want to snapshot. In `pacman` command line output you'll then see warnings as needed and regular info-level messages for newly created snapshots where possible:
|
|
```
|
|
:: Running pre-transaction hooks...
|
|
(1/1) Create ZFS snapshot(s)
|
|
[WARN] ZFS snapshot skipped (same operation exists at 2023-03-07-0116):
|
|
[WARN] zpool/root/archlinux@pacman_2023-03-07-0116_op:upgr_sev:trv_pkgs:tmux
|
|
[WARN] zpool/root/archlinux/pacman-cache@pacman_2023-03-07-0116_op:upgr_sev:trv_pkgs:tmux
|
|
[INFO] ZFS snapshot atomically done:
|
|
[INFO] zpool/data/var/lib/docker@pacman_2023-03-07-0116_op:upgr_sev:trv_pkgs:tmux
|
|
```
|
|
|
|
This behavior is not configurable. During testing and development we considered adding a monotonically increasing counter to timestamps such as:
|
|
```
|
|
...2023-03-07-0116-1...
|
|
...2023-03-07-0116-2...
|
|
...2023-03-07-0116-3...
|
|
```
|
|
While this would effectively avoid naming collisions we decided against it. Weighing pros and cons the _skip_ approach seems ever so sleightly simpler than the _counter_ approach.
|
|
|
|
# Rollback
|
|
|
|
After a rollback for example via the excellent [ZFSBootMenu](https://docs.zfsbootmenu.org/) `pacman` and all AUR helpers you may be using will consider the `pacman` database to be locked. No `pacman` transactions can start, you will for example see:
|
|
|
|
- In `pacman`
|
|
```
|
|
# pacman -Syu
|
|
:: Synchronizing package databases...
|
|
error: failed to synchronize all databases (unable to lock database)
|
|
```
|
|
- In `paru`
|
|
```
|
|
$ paru
|
|
:: Pacman is currently in use, please wait...
|
|
```
|
|
|
|
The moment a snapshot was created `pacman` was already in a transaction so it had already written its lock file to `/var/lib/pacman/db.lck`. After a clean finish `pacman` would have deleted that lock itself but since you rolled back to a point mid-transaction it's still there. Just delete the file and you're good to go:
|
|
```
|
|
sudo rm /var/lib/pacman/db.lck
|
|
```
|
|
|
|
# Development
|
|
|
|
## Conventional commits
|
|
|
|
This project uses [Conventional Commits](https://www.conventionalcommits.org/) for its commit messages.
|
|
|
|
### Commit types
|
|
|
|
Commit _types_ besides `fix` and `feat` are:
|
|
|
|
- `build`: Project structure, directory layout, build instructions for roll-out
|
|
- `refactor`: Keeping functionality while streamlining or otherwise improving function flow
|
|
- `test`: Working on test coverage
|
|
- `docs`: Documentation for project or components
|
|
|
|
### Commit scopes
|
|
|
|
The following _scopes_ are known for this project. A Conventional Commits commit message may optionally use one of the following scopes or none:
|
|
|
|
- `conf`: How we deal with script config
|
|
- `script`: Any other script work that doesn't specifically fall into the above scopes
|
|
- `hook`: Configuring the hook(s)
|
|
- `meta`: Affects the project's repo layout, readme content, file names etc.
|