A person holding a bunch of SSDs like a hand of cards in a poker game.

Once in a while I get a degraded pool. Replacing such a disk and starting the resilver process is usually very easy but there are some caveats when it comes to the of , which is usually the rootfs of the Proxmox system itself. So when a disk dies and gets swapped by a helping hand in the data centre I usually end up with a system that doesn’t boot. Linux really doesn’t like a degraded as rootfs and chances are that it was the new empty disk, that was searched for a bootloader by the BIOS/EFI.

The necessary steps are explained in detail in the “Proxmox VE Administration Guide” under “Changing a failed bootable device” but since I’ll forget this in 10 minutes again I’m writing this up here now so I can find it later via search engines again (this happened!)

So here is the check-list:

  • [ ] Boot a rescue system (out of scope, depends on data centre)
  • [ ] Get ZFS support working (out of scope, depends on data centre / rescue system (yes, a Proxmox Install ISO can be used too!))
  • [ ] Copy partition table from working disk to new disk so we get the same partition layout
  • [ ] Randomize GUIDs for the copied partition layout (having the same partition IDs will confuse the system _a lot_)
  • [ ] Remove degraded disk partition from the rpool
  • [ ] Add new disk _partition_ to the rpool (default is partition 3 for Proxmox)
  • [ ] Reinstall grub / bootloader and/or EFI stuff (default is partition 1+2 for Proxmox)
  • [ ] Don’t bitch to Beko because copying everything here blindly without using the own brain and adjusting to the own situation didn’t work and all data was lost – you break it: you keep the pieces.

sgdisk can be used to replicate the partition table and to get some new IDs:

sgdisk /dev/oldbutgooddisk_n1 -R /dev/shinynewdisk_n1
sgdisk -G /dev/shinynewdisk_n1

Next is replacing the degraded disk in the pool. This can be done in an easy way or the hard way. Chances are that the pool has to be imported first though so changes can be made. This probably needs the “force” Parameter, because the pool was last mounted from another system:

zpool import -f -d /dev/oldbutgooddisk_n1p3
zpool status

This worked with some luck and now the identifiers used by ZFS can be noted from the NAME column. This info is needed to replace the broken|degraded disk partition with the newly created one.

zpool replace -f rpool oldandbrokendisk_n1p3 /dev/shinynewdisk_n1p3
zpool status

This should now show the new disk, where the old and broken disk used to be, and a resilvering process as state. For some reasons this sometimes fails so there is also a hard way. YMMV:

zpool offline rpool oldandbrokendisk_n1p3
zpool detach rpool oldandbrokendisk_n1p3
zpool status -P rpool
zpool attach rpool /dev/oldbutgooddisk_n1p3 /dev/shinynewdisk_n1p3
zpool status

Are we there yet? No. The bootloader has to be installed on shinynewdisk too and the boot partition has to be mirrored as well (it’s outside of rpool). Luckily Proxmox comes with a neat tool for this so this doesn’t have to be done manually alas it is only available on a Proxmox system and not from a generic rescue system. Time to chroot. With ZFS though (pool has to be imported first – see above!):

mkdir /mnt/rpool
# !! Do not forget to change mountpoint back to "/" later!!
zfs set mountpoint=/mnt/rpool rpool/ROOT/pve-1
mount -t proc proc /mnt/rpool/proc
mount -t sysfs sys /mnt/rpool/sys
mount -o bind /dev /mnt/rpool/dev
mount -o bind /run /mnt/rpool/run
chroot /mnt/rpool

The proxmox-boot-tool can now be accessed inside the chrooted environment and the bootloader and boot partition can be written with this again but it’s command is depending on whether it’s status reports GRUB or EFI. The boot|EFI partition is number 2 on a default Proxmox install:

proxmox-boot-tool status
proxmox-boot-tool format /dev/shinynewdisk_n1p2
# With GRUB:
proxmox-boot-tool init /dev/shinynewdisk_n1p2 grub
# Without GRUB:
proxmox-boot-tool init /dev/shinynewdisk_n1p2
exit

It may make sense to check the “Proxmox VE Administration Guide” on this when unsure. The important chapter is “Setting up a new partition for use as synced ESP”. Status will also complain about a missing configured partition ID. That’s from the failed disk that was removed. The offending line may be removed from the suggested configuration file but that warning may as well be ignored. blkid may be used to check on existing IDs.

Are we there yet? NO! The ZFS mountpoint has to be adjusted again, after exiting the chroot environment, or the next boot will fail. For this everything has to be unmounted in reverse order and the pool exported:

zfs set mountpoint=/ rpool/ROOT/pve-1
zpool export -a

Now it’s time for ~~thoughts and prayers~~ a reboot. Good luck future me!

So didn’t show my GPU on when started via . I have an and all the (#ROCr / ) stuff installed. It only listed the iGPU by Intel on startup:

[---] OpenCL: Intel GPU 0: Intel(R) UHD Graphics 630 (driver version 23.35.27191.9, device version OpenCL 3.0 NEO, 25561MB, 2556>
[---] libc:  version 2.37

This works however fine when I run boinc manually as user (or clinfo for the matter), and not via systemctl start boinc-client, so I guessed it’s some permission issue. journalctl had the context I was looking for and threw this in the middle of the boinc-client startup:

audit[305157]: AVC avc:  denied  { read write } for  pid=305157 comm="boinc" name="kfd" dev="devtmpfs" ino=532 scontext=system_u:system_r:boinc_t:s0 tcontext=system_u:object_r:hsa_device_t:s0 tclass=chr_file permissive=0

This is SELinux’s charming way of telling me that it blocked read and write access to /dev/kfd (the main compute interface shared by all GPUs, according to the ROCm manual) for the boinc process. Nice. So what most users do now is grumble and disable SELinux, which is kinda a bad idea. The more advanced user does this and calls it a day:

sudo ausearch -c 'boinc' --raw | audit2allow -M boinc
sudo semodule -i boinc.pp

This basically prepares an override policy based on any rejected boinc activity that looks in my case like this:

module boinc 1.0;

require {
	type hsa_device_t;
	type random_device_t;
	type boinc_t;
	class chr_file { ioctl map open read write };
}

#============= boinc_t ==============
allow boinc_t hsa_device_t:chr_file { ioctl map open read write };
allow boinc_t random_device_t:chr_file write;

Not today though. It left me befuddled with the following output:

libsemanage.semanage_direct_install_info: Overriding boinc module at lower priority 100 with module at priority 400.
Failed to resolve typeattributeset statement at /var/lib/selinux/targeted/tmp/modules/400/boinc/cil:3
Failed to resolve AST
semodule:  Failed!

…and I have no idea why. I also found nothing on Google Search. So to not be DenverCoder9 (https://xkcd.com/979/) in the future here is what I found out so far:

sudo cat /var/lib/selinux/targeted/tmp/modules/400/boinc/cil | bunzip2 
(typeattributeset cil_gen_require hsa_device_t)
(typeattributeset cil_gen_require random_device_t)
(typeattributeset cil_gen_require boinc_t)
(allow boinc_t hsa_device_t (chr_file (ioctl map open read write)))
(allow boinc_t random_device_t (chr_file (write)))

Apparently it can’t resolve the required typeattributeset boinc_t – which is kinda odd as it exists (see sudo semodule -X 100 --cil -E boinc and the resulting cil file). Frankly this is where SELinux lost me too. I found the man page for boinc_selinux, which is not really known on my Fedora system here, so I may be missing something. It suggests to enable permissive mode for boinc_t (instead of dropping SELinux altogether):

Note: semanage permissive -a boinc_t

can be used to make the process type boinc_t permissive. Permissive process types are not denied access by SELinux. AVC messages will still be generated.

https://linux.die.net/man/8/boinc_selinux

And sure enough on the next restart my AMD GPU became available:

[---] OpenCL: AMD/ATI GPU 0: AMD Radeon RX 6700 XT (driver version 3558.0 (HSA1.1,LC), device version OpenCL 2.0, 12272MB, 12272>
[---] OpenCL: Intel GPU 0: Intel(R) UHD Graphics 630 (driver version 23.35.27191.9, device version OpenCL 3.0 NEO, 25561MB, 2556>
[---] libc:  version 2.37

Happy numbers crunching. Mebbe some fix for SELinux crosses my path in the future so I can update this with the proper solution.

It’s that time of the year again. It’s _cold_ outside and I sit at my PC anyway so let’s crunch some numbers to heat up the place:

> Dez 04 11:31:35 morpheus boinc[20347]: 04-Dec-2023 11:31:35 [World Community Grid] Requesting new tasks for CPU
> Dez 04 11:31:37 morpheus boinc[20347]: 04-Dec-2023 11:31:37 [World Community Grid] Scheduler request failed: HTTP service unavailable

Or not. AGAIN? FFS, how long has this migration been going?

Feels like nothing changed compared to last year.

* DNS works: dig +short scheduler.worldcommunitygrid.org > 199.241.167.118
* HTTP works: curl https://scheduler.worldcommunitygrid.org/boinc/wcg_cgi/fcgi
* The WCG server? Not so much:

<html><body><h1>503 Service Unavailable</h1>
No server is available to handle this request.
</body></html>

Guess I have to look for another project. My patience run dry like the task list.

USB Hotel

USB hotel almost completely occupied. Something something nature is healing.

Image: A cardboard box filled with empty toilet paper rolls up-cycled as USB / cable holders.

Trying to eradicate waste from the household is hard! I mean beside obvious stuff, like soap instead of shampoo, it’s literally everywhere. I was just introduced to a more unexpected solution by ‘Dimicator’, who’s work I’m following closely for some years now. Roland suggests waxed cloth even in the fridge and not just for or – a very solution:

https://exciting-pioneer-6049.ck.page/posts/historical-crafts-natural-raw-materials

He also helpfully explains how to waxed cloth: https://www.patreon.com/posts/90314554

No first hand experience on this yet but tbf we’re already _not_ wrapping food in _single use_ plastic anyway. It is intriguing though. I mean people did fine without plastic for food supplies for centuries, no? 🤷

Tätigkeiten, die mir Freude machen II (kajo76.de)

Ideen umsetzen, was das Blog betrifft – gestern z.B. die Simple Lightbox installiert, da die Meow-Lightbox (warum auch immer 🤔) nicht auf Anhieb funktionierte. Gestern abend dann noch nach CSS-Selektoren geschaut, um die Webmentions und das Webmention-Formular dem Theme anzupassen. Heute morgen zusätzliches CSS ins Theme eingefügt und voilá 😊 Hach 🥹 Endlich wieder! Das […]

Ja das ist so richtig schwer :-/ Jedenfalls viel Erfolg bei der Suche!

Another night in the verse 🚀

Made some progress on the HUD (I think I need a name for that). It does provide me with some additional informations depending on what I’m doing. The Route Plan e.g. disappears automatically when the destination is reached (yeah yeah the Jump count is off, will fix that eventually).

Same for scan targets – that also reveal bounties (with rewards in Cr so I know if it’s worth the hassle :D).

Really like where this is going.