Once in a while I get a degraded #ZFS pool. Replacing such a disk and starting the resilver process is usually very easy but there are some caveats when it comes to the #rpool of #Proxmox, which is usually the rootfs of the Proxmox system itself. So when a disk dies and gets swapped by a helping hand in the data centre I usually end up with a system that doesn’t boot. Linux really doesn’t like a degraded #raid as rootfs and chances are that it was the new empty disk, that was searched for a bootloader by the BIOS/EFI.
The necessary steps are explained in detail in the “Proxmox VE Administration Guide” under “Changing a failed bootable device” but since I’ll forget this in 10 minutes again I’m writing this up here now so I can find it later via search engines again (this happened!)
So here is the check-list:
- [ ] Boot a rescue system (out of scope, depends on data centre)
- [ ] Get ZFS support working (out of scope, depends on data centre / rescue system (yes, a Proxmox Install ISO can be used too!))
- [ ] Copy partition table from working disk to new disk so we get the same partition layout
- [ ] Randomize GUIDs for the copied partition layout (having the same partition IDs will confuse the system _a lot_)
- [ ] Remove degraded disk partition from the rpool
- [ ] Add new disk _partition_ to the rpool (default is partition 3 for Proxmox)
- [ ] Reinstall grub / bootloader and/or EFI stuff (default is partition 1+2 for Proxmox)
- [ ] Don’t bitch to Beko because copying everything here blindly without using the own brain and adjusting to the own situation didn’t work and all data was lost – you break it: you keep the pieces.
sgdisk
can be used to replicate the partition table and to get some new IDs:
sgdisk /dev/oldbutgooddisk_n1 -R /dev/shinynewdisk_n1
sgdisk -G /dev/shinynewdisk_n1
Next is replacing the degraded disk in the pool. This can be done in an easy way or the hard way. Chances are that the pool has to be imported first though so changes can be made. This probably needs the “force” Parameter, because the pool was last mounted from another system:
zpool import -f -d /dev/oldbutgooddisk_n1p3
zpool status
This worked with some luck and now the identifiers used by ZFS can be noted from the NAME column. This info is needed to replace the broken|degraded disk partition with the newly created one.
zpool replace -f rpool oldandbrokendisk_n1p3 /dev/shinynewdisk_n1p3
zpool status
This should now show the new disk, where the old and broken disk used to be, and a resilvering process as state. For some reasons this sometimes fails so there is also a hard way. YMMV:
zpool offline rpool oldandbrokendisk_n1p3
zpool detach rpool oldandbrokendisk_n1p3
zpool status -P rpool
zpool attach rpool /dev/oldbutgooddisk_n1p3 /dev/shinynewdisk_n1p3
zpool status
Are we there yet? No. The bootloader has to be installed on shinynewdisk too and the boot partition has to be mirrored as well (it’s outside of rpool). Luckily Proxmox comes with a neat tool for this so this doesn’t have to be done manually alas it is only available on a Proxmox system and not from a generic rescue system. Time to chroot. With ZFS though (pool has to be imported first – see above!):
mkdir /mnt/rpool
# !! Do not forget to change mountpoint back to "/" later!!
zfs set mountpoint=/mnt/rpool rpool/ROOT/pve-1
mount -t proc proc /mnt/rpool/proc
mount -t sysfs sys /mnt/rpool/sys
mount -o bind /dev /mnt/rpool/dev
mount -o bind /run /mnt/rpool/run
chroot /mnt/rpool
The proxmox-boot-tool
can now be accessed inside the chrooted environment and the bootloader and boot partition can be written with this again but it’s command is depending on whether it’s status reports GRUB or EFI. The boot|EFI partition is number 2 on a default Proxmox install:
proxmox-boot-tool status
proxmox-boot-tool format /dev/shinynewdisk_n1p2
# With GRUB:
proxmox-boot-tool init /dev/shinynewdisk_n1p2 grub
# Without GRUB:
proxmox-boot-tool init /dev/shinynewdisk_n1p2
exit
It may make sense to check the “Proxmox VE Administration Guide” on this when unsure. The important chapter is “Setting up a new partition for use as synced ESP”. Status will also complain about a missing configured partition ID. That’s from the failed disk that was removed. The offending line may be removed from the suggested configuration file but that warning may as well be ignored. blkid
may be used to check on existing IDs.
Are we there yet? NO! The ZFS mountpoint has to be adjusted again, after exiting the chroot environment, or the next boot will fail. For this everything has to be unmounted in reverse order and the pool exported:
zfs set mountpoint=/ rpool/ROOT/pve-1
zpool export -a
Now it’s time for ~~thoughts and prayers~~ a reboot. Good luck future me!
@beko rad! ALT text finally gets pushed too ❤️
@beko very nice write up!
@beko thanks old me. I totally needed this today again.