Compare commits

...

13 Commits

Author SHA1 Message Date
93380891ec Merge remote-tracking branch 'refs/remotes/origin/1-get-base-version-going' into 1-get-base-version-going 2023-12-26 00:20:01 +01:00
8de0e25ab8 refactor(docs): Spell ZFS all uppercase (#1) 2023-12-26 00:09:42 +01:00
4069c33145 feat(docs): Don't do duplicate snapshot names (#1) 2023-12-26 00:08:49 +01:00
2fbbf4da19 refactor(docs): Add pkg list trimming (#1) 2023-12-25 22:30:37 +01:00
47885efbab refactor(docs): Explain ZFS snapshot name max length (#1) 2023-12-25 22:30:16 +01:00
e3f6316c47 feat(docs): Explain pkg name trimming in snapshots (#1) 2023-12-25 22:12:15 +01:00
7681bbde84 refactor(docs): We're doing 10 imp snapshots by default (#1) 2023-12-25 22:11:41 +01:00
961649d252 refactor(docs): Explain new important pkgs regex (#1) 2023-12-25 21:52:20 +01:00
360c726326 refactor(conf): Keep more important snapshots (#1)
On an example daily driver test machine 10
important snapshots allow for about one full
month of rollbacks. Sounds and feels reasonable.
2023-12-25 21:50:12 +01:00
7d573627f5 refactor(conf): Add more important package names to default config (#1)
Users running ZFS may come in contact with the
archzfs repository. They may also want to try
linux-zen kernels. Add those to default config so
those users are covered.
2023-12-25 21:48:52 +01:00
818082a0b3 refactor(script): Trim package name if it doesn't fit full-length (#1) 2023-12-25 21:45:54 +01:00
04caca48a5 refactor(script): Add pkgs to snapshot name only when allowed by user (#1) 2023-12-25 21:45:05 +01:00
7efb2e5821 fix(script): Check severity against actaual severity value (#1)
Instead of checking against hard-coded string 'imp'
2023-12-25 20:47:39 +01:00
3 changed files with 192 additions and 35 deletions

View File

@@ -44,16 +44,21 @@ zpool/root/archlinux
```
If exactly one source returns that is the exact name of a ZFS dataset in an imported zpool we create a snapshot on it. If no source returns we silently exit. If more than one source returns we raise an error and halt the `pacman` transaction.
We retain two different snapshot chains, one for `pacman` transactions that only affect what we are calling _trivial_ packages and a separate chain for _important_ packages. By default only the exact regular expression package name match `^(linux|systemd|zfs-(dkms|utils))$` is considered important so in plain English any one of:
We retain two different snapshot chains, one for `pacman` transactions that only affect what we are calling _trivial_ packages and a separate chain for _important_ packages. By default only the exact regular expression package name match `^(linux(-zen)?(-headers)?|systemd|zfs-(linux(-zen)?|dkms|utils))$` is considered important so in plain English any one of:
- `linux`
- `linux-headers`
- `linux-zen`
- `linux-zen-headers`
- `systemd`
- `zfs-linux`
- `zfs-linux-zen`
- `zfs-dkms`
- `zfs-utils`
Whenever an important package is affected by a transaction a snapshot goes into the corresponding chain. In all other cases - when an important package is not affected - snapshots go into the trivial chain.
The _trivial_ snapshot chain by default keeps 25 snapshots, the _important_ chain keeps 10. The thought process here is that you will likely not futz around with a kernel every day whereas you may very well install arbitrary packages multiple times a day. Snapshots should keep you safe for a couple of days hence the defaults of 10 and 25 snapshots, respectively.
The _trivial_ snapshot chain by default keeps 25 snapshots, the _important_ chain keeps 10. The thought process here is that you will likely not futz around with a kernel every day whereas you may very well install arbitrary packages multiple times a day. Snapshots should keep you safe for a low number of weeks up to maybe a full month on an average daily driver system hence the defaults of 10 and 25 snapshots, respectively.
Snapshots may look like so:
```
@@ -64,6 +69,7 @@ zpool/root ▼ | important packag
zpool/root/archlinux ┌─────────────┐ ▼▼▼
zpool/root/archlinux@pacman_2023-03-07-0113_op:upgr_sev:imp_pkgs:systemd:bind:enchant:grep
zpool/root/archlinux@pacman_2023-03-07-0113_op:upgr_sev:trv_pkgs:jdk17-temurin
zpool/root/archlinux@pacman_2023-03-07-0114_op:inst_sev:trv_pkgs:docker-credential-secretser...
zpool/root/archlinux@pacman_2023-03-07-0115_op:upgr_sev:trv_pkgs:proton-ge-custom-bin
▲▲▲▲ ▲▲▲ └────────────────────────────┘
| | Max. 30 characters per our
@@ -72,8 +78,87 @@ Pacman operation that triggered this snapshot ───┘ | pacm
Severity based on affected packages, here trivial ───────┘
```
Notice how snapshot line 3 ends in `docker-credential-secretser...`. This snapshot was triggered on installation of the Arch User Repository package [docker-credential-secretservice-bin](https://aur.archlinux.org/packages/docker-credential-secretservice-bin) whose package name is 35 characters long. In this example our `pkgs_list_max_length` setting limits maximum name of the packages string to `30` characters. If we can't naturally fit package names into this limit by removing packages from the list we instead cut off part of the package name and add an ellipsis (three dots `...`). The default setting is `pkgs_list_max_length='30'`. In case the user wants three characters or fewer thus making an ellipsis impractical we simply trim the package name to that many characters:
```
pkgs_list_max_length='7': dock...
pkgs_list_max_length='6': doc...
pkgs_list_max_length='5': do...
pkgs_list_max_length='4': d...
pkgs_list_max_length='3': doc
pkgs_list_max_length='2': do
pkgs_list_max_length='1': d
```
With a package list allowance of 0 characters the entire `pkgs` field is removed. Above example will then look like so:
```
$ zfs list -o name -t all
NAME
zpool
zpool/root
zpool/root/archlinux
zpool/root/archlinux@pacman_2023-03-07-0113_op:upgr_sev:imp
zpool/root/archlinux@pacman_2023-03-07-0113_op:upgr_sev:trv
zpool/root/archlinux@pacman_2023-03-07-0114_op:inst_sev:trv
zpool/root/archlinux@pacman_2023-03-07-0115_op:upgr_sev:trv
```
Whatever you set as your `pkgs_list_max_length` is still just a best effort as it is subject to ZFS' internal maximum for dataset name length. This limit is currently 255 characters. For a snapshot this is counted against the dataset name in front of the `@` character plus everything else starting with the `@` character til the end. If you'd like e.g. 200 characters allocated to the package list chances are that you'll see fewer characters than that depending on how long your dataset names are on their own.
Have a look at `pacman-zfs-snapshot.conf` as well, its comments should be clear enough to get you going.
# Avoiding naming collisions
By default snapshot names contain a timestamp formatted like so: `2023-03-07-0114`. This makes snapshot names reasonably unique. You can change both the timestamp format and timezone in `pacman-zfs-snapshot.conf` where the format defaults to:
```
snap_date_format='%F-%H%M'
```
And the timezone defaults to:
```
snap_timezone='Etc/UTC'
```
With these settings it is possible to cause ZFS snapshot name collisions (meaning reuse of the exact same snapshot name) when all of the following conditions are true for any two `pacman` operations:
- They occur within the same minute
- They cover the same type of operation (_Install_, _Remove_ or _Upgrade_)
- They cover the same list of packages
For example by running `pacman -S tmux` three times within the same minute (once for an _Install_ operation and two more times for two identical _Upgrade_ operations) your system may generate the following example snapshots:
```
zpool/root/archlinux@pacman_2023-03-07-0116_op:inst_sev:trv_pkgs:tmux 1.66M - 20.7G -
zpool/root/archlinux@pacman_2023-03-07-0116_op:upgr_sev:trv_pkgs:tmux 2.17M - 20.7G -
```
Notice that there is no third snapshot for the second identical _Upgrade_ operation as this script skipped snapshot creation.
The rationale is that you're doing the exact same operation twice or more. There's most likely no reasonable expectaion that your operating system enters a different state on successive `pacman` operations so there's no need to deal with multiple snapshots capturing the same state.
Your `pacman` command line output will show this like so:
```
:: Running pre-transaction hooks...
(1/1) Create ZFS snapshot(s)
[WARN] ZFS snapshot skipped (same operation exists at 2023-03-07-0116):
[WARN] zpool/root/archlinux@pacman_2023-03-07-0116_op:upgr_sev:trv_pkgs:tmux
[WARN] No ZFS snapshot left to do after accounting for identical operations at 2023-03-07-0116.
```
Note that this script will not blindly skip doing **_all_** snapshots in this situation. It will still happily create snapshots that don't cause naming collisions for example when affected snapshots were already deleted or when you're adding an additional dataset to the list of datasets you want to snapshot. In `pacman` command line output you'll then see warnings as needed and regular info-level messages for newly created snapshots where possible:
```
:: Running pre-transaction hooks...
(1/1) Create ZFS snapshot(s)
[WARN] ZFS snapshot skipped (same operation exists at 2023-03-07-0116):
[WARN] zpool/root/archlinux@pacman_2023-03-07-0116_op:upgr_sev:trv_pkgs:tmux
[WARN] zpool/root/archlinux/pacman-cache@pacman_2023-03-07-0116_op:upgr_sev:trv_pkgs:tmux
[INFO] ZFS snapshot atomically done:
[INFO] zpool/data/var/lib/docker@pacman_2023-03-07-0116_op:upgr_sev:trv_pkgs:tmux
```
This behavior is not configurable. During testing and development we considered adding a monotonically increasing counter to timestamps such as:
```
...2023-03-07-0116-1...
...2023-03-07-0116-2...
...2023-03-07-0116-3...
```
While this would effectively avoid naming collisions we decided against it. Weighing pros and cons the _skip_ approach seems ever so sleightly simpler than the _counter_ approach.
# Rollback
After a rollback for example via the excellent [ZFSBootMenu](https://docs.zfsbootmenu.org/) `pacman` and all AUR helpers you may be using will consider the `pacman` database to be locked. No `pacman` transactions can start, you will for example see:

View File

@@ -9,7 +9,7 @@ do_dry_run='false'
# before a pacman transaction on an important package have a separate
# retention from snapshots for trivial packages. Lends itself to keeping
# high-risk updates separate from everything else.
important_names='linux|systemd|zfs-(dkms|utils)'
important_names='linux(-zen)?(-headers)?|systemd|zfs-(linux(-zen)?|dkms|utils)'
# Number snapshots to keep
snaps_trivial_keep='25'

View File

@@ -127,7 +127,7 @@ function trim_globally_snappable_datasets () {
}
function write_pkg_list_oneline () {
if [[ "${severity}" == 'imp' ]]; then
if [[ "${severity}" == "${snaps_important_suffix}" ]]; then
for pkg in "${important_pkgs_in_transaction[@]}"; do
if [[ "${unabridged_pkg_list_oneline}" ]]; then
unabridged_pkg_list_oneline="${unabridged_pkg_list_oneline}${pkg_separator}${pkg}"
@@ -180,57 +180,129 @@ function find_max_dataset_name_length () {
fi
}
function trim_single_remaining_package_name () {
local pkg_name
pkg_name="${shorter_pkg_list}"
case 1 in
# Trim to 1 to 3 characters, no trailing ellipsis (...)
$(( 1<=pkgs_list_max_length && pkgs_list_max_length<=3 )))
pkg_name="${pkg_name::${pkgs_list_max_length}}"
;;
# Show as many pkg name characters as we can while also
# fitting an ellipsis into the name (...) to indicate
# that we've cut the pkg name off at the end.
$(( pkgs_list_max_length>=4 )))
pkg_name="${pkg_name::$(( pkgs_list_max_length - 3 ))}"'...'
;;
esac
shorter_pkg_list="${pkg_name}"
}
function trim_pkg_list_oneline () {
local available_pkg_list_length
available_pkg_list_length="$((${max_zfs_snapshot_name_length} - ${max_dataset_name_length}))"
if [[ "${available_pkg_list_length}" -lt "${pkgs_list_max_length}" ]]; then
# If we have fewer characters available than the user wants limit
# package list length
# If we have fewer characters available before hitting the
# ZFS internal maximum snapshot name length than the user
# wants limit package list length.
pkgs_list_max_length="${available_pkg_list_length}"
fi
local shorter_pkg_list
shorter_pkg_list="${unabridged_pkg_list_oneline}"
while [[ "${#shorter_pkg_list}" -gt "${pkgs_list_max_length}" ]]; do
shorter_pkg_list="${shorter_pkg_list%${pkg_separator}*}"
if ! grep -Piq "${pkg_separator}" <<<"${shorter_pkg_list}"; then
# Only one package remains in package list, no need to continue
break
if [[ "${pkgs_list_max_length}" -le '0' ]]; then
# User wants zero characters of pkg names in snapshot name,
# no need to even find an appropriate pkg name string. Just
# set to empty string and we're done here.
shorter_pkg_list=''
else
shorter_pkg_list="${unabridged_pkg_list_oneline}"
while [[ "${#shorter_pkg_list}" -gt "${pkgs_list_max_length}" ]]; do
shorter_pkg_list="${shorter_pkg_list%${pkg_separator}*}"
if ! grep -Piq "${pkg_separator}" <<<"${shorter_pkg_list}"; then
# Only one package remains in package list, no need to continue
break
fi
done
# If pkg name is still too long trim it. If there's enough
# space for an ellipsis (...) we add that to indicate we've
# trimmed the name, otherwise we just take however many
# characters of the pkg name we can get.
if [[ "${#shorter_pkg_list}" -gt "${pkgs_list_max_length}" ]]; then
trim_single_remaining_package_name
fi
fi
trimmed_pkg_list_oneline="${shorter_pkg_list}"
}
function omit_duplicate_snaps () {
local existing_snaps
local -a unneeded_snaps
existing_snaps="$(zfs list -t all -oname -H)"
for planned_snap in "${planned_snaps[@]}"; do
if grep -Piq -- '^'"${planned_snap}"'$' <<<"${existing_snaps}"; then
unneeded_snaps+=("${planned_snap}")
else
needed_snaps+=("${planned_snap}")
fi
done
if [[ "${#shorter_pkg_list}" -gt "${pkgs_list_max_length}" ]]; then
# If this is still too long we empty the package list
shorter_pkg_list=''
if [[ "${#unneeded_snaps[@]}" -gt '0' ]]; then
if [[ "${do_dry_run}" == 'true' ]]; then
pprint 'warn' 'Dry-run, ZFS snapshot skipped (same operation exists at '"${date_string}"'):'
else
pprint 'warn' 'ZFS snapshot skipped (same operation exists at '"${date_string}"'):'
fi
for unneeded_snap in "${unneeded_snaps[@]}"; do
pprint 'warn' ' '"${unneeded_snap}"
done
fi
trimmed_pkg_list_oneline="${shorter_pkg_list}"
}
function do_snaps () {
local snap_name snap_return_code
local -a planned_snaps
for snappable_dataset_id in "${!snappable_datasets[@]}"; do
snap_name="${snappable_datasets[${snappable_dataset_id}]}"'@'"${snap_name_prefix}${snap_field_separator}${date_string}${snap_field_separator}"'op:'"${conf_op_suffix}${snap_field_separator}"'sev:'"${severity}${snap_field_separator}"'pkgs:'"${trimmed_pkg_list_oneline}"
snap_name="${snappable_datasets[${snappable_dataset_id}]}"'@'"${snap_name_prefix}${snap_field_separator}${date_string}${snap_field_separator}"'op:'"${conf_op_suffix}${snap_field_separator}"'sev:'"${severity}"
# If we have at least one pkg name character to append we do
# so now but if we're not even allowed to append a single
# character we might as well skip the 'pkgs' field
# altogether.
if [[ "${pkgs_list_max_length}" -ge '1' ]]; then
snap_name="${snap_name}${snap_field_separator}"'pkgs:'"${trimmed_pkg_list_oneline}"
fi
planned_snaps["${snappable_dataset_id}"]="${snap_name}"
done
if [[ "${do_dry_run}" == 'true' ]]; then
pprint 'info' 'Dry-run, pretending to atomically do zfs snapshot:'
for planned_snap in "${planned_snaps[@]}"; do
pprint 'info' ' '"${planned_snap}"
done
else
zfs snapshot "${planned_snaps[@]}"
snap_return_code="${?}"
if [[ "${snap_return_code}" -eq '0' ]]; then
successfully_snapped_datasets=("${snappable_datasets[@]}")
pprint 'info' 'zfs snapshot atomically done:'
for planned_snap in "${planned_snaps[@]}"; do
pprint 'info' ' '"${planned_snap}"
local -a needed_snaps
omit_duplicate_snaps
if [[ "${#needed_snaps[@]}" -gt '0' ]]; then
if [[ "${do_dry_run}" == 'true' ]]; then
pprint 'info' 'Dry-run, pretending to atomically do ZFS snapshot:'
for needed_snap in "${needed_snaps[@]}"; do
pprint 'info' ' '"${needed_snap}"
done
else
pprint 'warn' 'zfs snapshot failed:'
for planned_snap in "${planned_snaps[@]}"; do
pprint 'warn' ' '"${planned_snap}"
done
zfs snapshot "${needed_snaps[@]}"
snap_return_code="${?}"
if [[ "${snap_return_code}" -eq '0' ]]; then
successfully_snapped_datasets=("${snappable_datasets[@]}")
pprint 'info' 'ZFS snapshot atomically done:'
for needed_snap in "${needed_snaps[@]}"; do
pprint 'info' ' '"${needed_snap}"
done
else
pprint 'warn' 'ZFS snapshot failed:'
for needed_snap in "${needed_snaps[@]}"; do
pprint 'warn' ' '"${needed_snap}"
done
fi
fi
else
if [[ "${do_dry_run}" == 'true' ]]; then
pprint 'warn' 'Dry-run, no ZFS snapshot left to do after accounting for identical operations at '"${date_string}"'.'
else
pprint 'warn' 'No ZFS snapshot left to do after accounting for identical operations at '"${date_string}"'.'
fi
fi
}
@@ -248,7 +320,7 @@ function do_retention () {
local snap_list snaps_done_by_us snaps_in_cur_sev snaps_limit oldest_snap snap_return_code
local -a destroyed_snaps failed_to_destroy_snaps
if [[ "${do_dry_run}" == 'true' ]]; then
pprint 'info' 'Dry-run, skipping potential zfs destroy operations ...'
pprint 'info' 'Dry-run, skipping potential ZFS destroy operations ...'
else
for successfully_snapped_dataset in "${successfully_snapped_datasets[@]}"; do
snaps_in_cur_sev="$(get_snaps_in_cur_sev "${successfully_snapped_dataset}")"