A person holding a bunch of SSDs like a hand of cards in a poker game.

Once in a while I get a degraded pool. Replacing such a disk and starting the resilver process is usually very easy but there are some caveats when it comes to the of , which is usually the rootfs of the Proxmox system itself. So when a disk dies and gets swapped by a helping hand in the data centre I usually end up with a system that doesn’t boot. Linux really doesn’t like a degraded as rootfs and chances are that it was the new empty disk, that was searched for a bootloader by the BIOS/EFI.

The necessary steps are explained in detail in the “Proxmox VE Administration Guide” under “Changing a failed bootable device” but since I’ll forget this in 10 minutes again I’m writing this up here now so I can find it later via search engines again (this happened!)

So here is the check-list:

  • [ ] Boot a rescue system (out of scope, depends on data centre)
  • [ ] Get ZFS support working (out of scope, depends on data centre / rescue system (yes, a Proxmox Install ISO can be used too!))
  • [ ] Copy partition table from working disk to new disk so we get the same partition layout
  • [ ] Randomize GUIDs for the copied partition layout (having the same partition IDs will confuse the system _a lot_)
  • [ ] Remove degraded disk partition from the rpool
  • [ ] Add new disk _partition_ to the rpool (default is partition 3 for Proxmox)
  • [ ] Reinstall grub / bootloader and/or EFI stuff (default is partition 1+2 for Proxmox)
  • [ ] Don’t bitch to Beko because copying everything here blindly without using the own brain and adjusting to the own situation didn’t work and all data was lost – you break it: you keep the pieces.

sgdisk can be used to replicate the partition table and to get some new IDs:

sgdisk /dev/oldbutgooddisk_n1 -R /dev/shinynewdisk_n1
sgdisk -G /dev/shinynewdisk_n1

Next is replacing the degraded disk in the pool. This can be done in an easy way or the hard way. Chances are that the pool has to be imported first though so changes can be made. This probably needs the “force” Parameter, because the pool was last mounted from another system:

zpool import -f -d /dev/oldbutgooddisk_n1p3
zpool status

This worked with some luck and now the identifiers used by ZFS can be noted from the NAME column. This info is needed to replace the broken|degraded disk partition with the newly created one.

zpool replace -f rpool oldandbrokendisk_n1p3 /dev/shinynewdisk_n1p3
zpool status

This should now show the new disk, where the old and broken disk used to be, and a resilvering process as state. For some reasons this sometimes fails so there is also a hard way. YMMV:

zpool offline rpool oldandbrokendisk_n1p3
zpool detach rpool oldandbrokendisk_n1p3
zpool status -P rpool
zpool attach rpool /dev/oldbutgooddisk_n1p3 /dev/shinynewdisk_n1p3
zpool status

Are we there yet? No. The bootloader has to be installed on shinynewdisk too and the boot partition has to be mirrored as well (it’s outside of rpool). Luckily Proxmox comes with a neat tool for this so this doesn’t have to be done manually alas it is only available on a Proxmox system and not from a generic rescue system. Time to chroot. With ZFS though (pool has to be imported first – see above!):

mkdir /mnt/rpool
# !! Do not forget to change mountpoint back to "/" later!!
zfs set mountpoint=/mnt/rpool rpool/ROOT/pve-1
mount -t proc proc /mnt/rpool/proc
mount -t sysfs sys /mnt/rpool/sys
mount -o bind /dev /mnt/rpool/dev
mount -o bind /run /mnt/rpool/run
chroot /mnt/rpool

The proxmox-boot-tool can now be accessed inside the chrooted environment and the bootloader and boot partition can be written with this again but it’s command is depending on whether it’s status reports GRUB or EFI. The boot|EFI partition is number 2 on a default Proxmox install:

proxmox-boot-tool status
proxmox-boot-tool format /dev/shinynewdisk_n1p2
# With GRUB:
proxmox-boot-tool init /dev/shinynewdisk_n1p2 grub
# Without GRUB:
proxmox-boot-tool init /dev/shinynewdisk_n1p2
exit

It may make sense to check the “Proxmox VE Administration Guide” on this when unsure. The important chapter is “Setting up a new partition for use as synced ESP”. Status will also complain about a missing configured partition ID. That’s from the failed disk that was removed. The offending line may be removed from the suggested configuration file but that warning may as well be ignored. blkid may be used to check on existing IDs.

Are we there yet? NO! The ZFS mountpoint has to be adjusted again, after exiting the chroot environment, or the next boot will fail. For this everything has to be unmounted in reverse order and the pool exported:

zfs set mountpoint=/ rpool/ROOT/pve-1
zpool export -a

Now it’s time for ~~thoughts and prayers~~ a reboot. Good luck future me!

Some time ago I needed a virtual machine and while I’m not entirely sure any more why that was I did seem to have an inspirational moment and made a template of this. Here is what the config for _may_ look like:

agent: 1
arch: aarch64
bios: ovmf
boot: cdn
bootdisk: scsi0
cores: 2
efidisk0: misfits-btrfs:501/vm-501-disk-0.raw,size=64M
ipconfig0: ip=192.168.2.251/32,gw=192.168.2.1
memory: 1024
name: arm-test2
nameserver: 192.168.2.1
net0: virtio=96:79:F4:02:A1:6B,bridge=vmbr2
numa: 0
ostype: l26
scsi0: misfits-btrfs:501/vm-501-disk-1.raw,size=8G
scsi1: local:iso/debian-10.6.0-arm64-netinst.iso,media=cdrom
scsi2: misfits-btrfs:501/vm-501-cloudinit.raw,media=cdrom,size=4M
scsihw: virtio-scsi-pci
serial0: socket
smbios1: uuid=63fe535c-1507-4528-8dee-2bd2d59b57f8
sockets: 2
vga: serial0

It makes sense to install the package cloud-init to some stuff can be set from outside of the machine.

…and yes, it’s just as slow as expected from an ARM 🤓

I’m also not entirely sure if this is really officially featured by Proxmox (just like btrfs 🤷) but the machine was doing it’s job without an issue for years and I did just replay the template on VE 7.4 so I guess it’s fine 🤷

Sometimes a backup or snapshot process is killed blocking any further backup. An LXC (or QM) may be left in a locked state and can not be unlocked. An IO error like this is thrown:
$ pct unlock $VMID
unable to open file '/etc/pve/nodes/$NODEID/lxc/$VMID.conf.tmp.32442' - Input/output error
Neither file nor process exist. The file can also not just be created, because it’s on a protected fuse mount. What usually helps is restarting the PVE-Cluster service (even on a single cluster node):
$ systemctl restart pve-cluster
$ pct unlock $VMID