Please use kernel-modules-hook instead of kernel-alive!

Feel free to move this post to another section if it belongs there!

I found a problem (potentially a bug) while upgrading kernels in manjaro. (I did use trizen, but think this may be a general problem)

But i am at a loss where to post this one.

When you update,

and the actual running kernel has to be updated too. There is the following process:

  • kernel(s) are replaced with the new kernel(s)
  • modules-dir in /lib/modules are replaced by new modules-dir
  • initramfs(s) are replaced with newly generated initramfs(s)
  • because the actual running kernel does need access to its own (old) modules (while running), this modules-dir is not removed until after the next boot
  • after the next successful boot this (old) modules-dir is deleted

To achieve this there is a file /lib/modules/.old that keeps the name of the old module-dir that has to be removed.

The problem:

In some situations there seems to be created an .old file, when it should not. Then at next boot the dir /lib/modules/5.14.10-1-MANJARO will be deleted. After that you can´t boot with this kernel any more.
This has happened to me at least 2 times in the last year. Each time i was at a loss why the dir /lib/modules/x.x.x-xMANJARO suddenly was missing. But every time

  • in the snapshot before the problem there was this .old file existing
  • And in the next snapshot the dir /lib/modules/x.x.x-xMANJARO was missing

I would like to discuss this with someone who knows the code. This can´t be upstream ? But where is the right place ? Development ? mhwd ?
This is update-related (trizen / pacman), but may be some script of MANJARO, and not of pacman.

Andreas :sunglasses:

1 Like

Probably:

2 Likes

This indeed is the right place.

I did find the test that leads to the deletion of a modules directory

  • when it is mentioned in an .old file
  • when it is not necessary for the actual running kernel
#!/bin/bash

#systemd service to cleanup old kernel

if [[ $(cat /usr/lib/modules/.old) ]]; then

	oldkern=$(cat /usr/lib/modules/.old)
	currentkern=$(uname -r)
	
	#check if old is no current and in this case remove the modules
	if [[ "$oldkern" != "$currentkern" ]]; then
		rm -r /usr/lib/modules/"$oldkern"
	fi
	#remove only the hidden file
	rm /usr/lib/modules/.old
fi

#backup old kernel

#rsync -AHXal "$${i}" /usr/lib/modules/.old/
#rm -rf "$${i}"
#done

This seems sufficient, but fails if

  • you have more then one kernel installed
  • you make the update lets say with kernel 5.10 running
  • your next boot after the update is with lets say kernel 5.14

Because then there may be a .old for modules of 5.10. And when booting with 5.14 the name of the kernel is not 5.10, so the modules of 5.10 are removed :frowning:

Next time you try to boot into 5.10, you wonder why you can´t boot any more into this kernel

3 Likes

That “feature” is also wrong because it violates an important principle of the UNIX filesystem hierarchy, namely that it performs a write operation on /usr, which is supposed to be a read-only segment of the directory hierarchy.

Barring manual and deliberate intervention by the sysadmin, the only thing that may write to /usr is the update process. Once the machine has rebooted, the update process is termed to have finished, and therefore nothing may write to /usr anymore.

Furthermore, doing so would not only violate the read-only integrity of /usr, but it would also fail on systems ─ such as mine ─ where /usr is on its own and read-only-mounted filesystem, e.g. if /usr is to be exported over the network.

1 Like

That script is not a suggestion of me. It is the script that is in place at the moment. :man_shrugging: And it failed on my machine and removed modules that i did need. :sob:

The kernel-alive package is a convenience package created to add convenience.

You can safely remove it as it is not required.

1 Like

We knows, Preciousss, we knows. :stuck_out_tongue:


Yes, but the package is in the Core repository, which means that it’s installed by default ─ or at least, in new installations, because I don’t have it here on my system ─ and something that comes installed by default should not violate the UNIX convention of allowing for a read-only /usr out-of-the-box.

Being able to have a read-only /usr is why /bin, /sbin and /lib (including /lib64 and /lib32) were made into symlinks that point to their equivalent directories under /usr in the first place.

The right thing to do would be to make this package optional, i.e. moving it over to the Extra repository. :thinking:

2 Likes

I will remove it because it caused me a lot of trouble.

  • But i do not remember to have it installed.

when removed it did produce an error, but it did get removed

trizen -R kernel-alive                                                                                                                      
:: Pacman command: /usr/bin/sudo /usr/bin/pacman -R kernel-alive
[sudo] Passwort fĂĽr andreas: 
Abhängigkeiten werden geprüft …

Pakete (1) kernel-alive-0.5-1

Gesamtgröße der entfernten Pakete:  0,00 MiB

:: Möchten Sie diese Pakete entfernen? [J/n] 
:: Pre-transaction-Hooks werden gestartet …
(1/2) Performing snapper pre snapshots for the following configurations...
==> root: 19454
(2/2) Remove systemd service to restore linux kernel modules
Failed to disable unit: Unit file linux-modules-cleanup.service does not exist.
Fehler: Befehl konnte nicht korrekt ausgefĂĽhrt werden
:: Paketänderungen werden verarbeitet …
(1/1) Entfernung läuft kernel-alive                                                            [-------------------------------------------------------] 100%
:: Post-transaction-Hooks werden gestartet …
(1/3) Arming ConditionNeedsUpdate...
(2/3) Removing unnecessary cached files (keeping the latest two)…
==> no candidate packages found for pruning
(3/3) Performing snapper post snapshots for the following configurations...
==> root: 19455

1 Like

It’s in Core now, so it comes preinstalled. I’ll ping @Ste74 ─ he’s the developer of that package. :wink:

I do think it is a nice feature to have the modules of the running kernel while it is running.
Is it possible to move these modules into a part of tmpfs (where it will get lost when the shutdown takes place) ?
Or is it possible to create a shutdown-hook to remove this dir ?

But there may be another problem existing. Why does the .old file even suggest to remove a dir that is in use by an installed kernel. This problem has to be addressed first.

Technically, it shouldn’t be needed, because the chances of you needing to load a kernel module in an old kernel that is to be replaced because you have just updated your system are very, very small.

Kernel modules are normally only loaded at boot time, and unlike userspace libraries, they are not opened and read from on an as-needed basis, because the kernel is monolithic ─ meaning that the kernel and all of its modules are running in the same address space, in ring 0 of the processor (cores) ─ and the modules are atomic; they are read into the kernel’s address space as a whole, not in chunks.

In theory, yes, but that would make the procedure even more complicated.

The modules would have to be moved to /tmp and then symlinked back to /usr ─ because that’s where the kernel expects to find them, given that /lib itself is a symlink to /usr/lib ─ which then boils down to the same thing, namely that you’ve got stuff in /usr that needs to be removed again after booting. And that’s a runtime write operation on /usr, which is a definitive no-no.

That would be possible too, but then this hook must only be triggered right after an update, and only if the running kernel was updated.

The main point is and remains that only the update process ─ or, through deliberate action, the sysadmin ─ should be allowed to write to /usr. Neither the boot process nor the shutdown process should be allowed to do that.

Technically, it doesn’t matter, because when it comes to kernel modules, they are never “in use” the way a shared library is.

The kernel modules are loaded into the kernel’s address space in their entirety, and so the kernel already has permanent access to the code in those modules ─ they’re in RAM already. By consequence, the on-disk copy of those modules can safely be removed while the outgoing kernel is running, because nothing’s reading from them.


P.S.: I’ve modified the closing timer on this thread to prevent it from being closed in three days. :wink:

2 Likes

I may have failed to word this understandable.
By in use i meant that there was a kernel at /boot/vmlinuz* that depended on the modules at /usr/lib/modules/X.XX.XX-X-MANJARO. And that this kernel afterwards would not be able to boot.

  • I had 3 kernels installed
  • Whilst an update (running kernel A) some kernels got updated
  • Then there was a wrong .old file
  • After a reboot with kernel B the directory with the modules of one of the 3 kernels got removed
    Therefore the .old file must have had a wrong content

This happened only 2 times in one year (also very small)

Not that small unfortunately. I had enough occurrences of trying to mount my external drive after having upgraded running kernel and forgetting about this fact (started upgrade and came back in an hour or two totally forgetting that I did so if course). Also, remember what we had recently in Unstable when cryptsetup and lvm2 had been updated but not systemd? Add kernel upgrade to this equation and you have a perfect example of a situation where you cannot reboot cuz system would be unbootable and you cannot use modules already. Ok this falls to the category of very small chance as you correctly named it, but still I can imagine how uncomfortable such situation would be.

2 Likes

This has been a grievance of mine with anything Arch-based. I call it “pulling the rug from under your feet.”

It has to do with a mis-match between the package naming-convention vs the actual installed directories.


In regards to a distro like Ubuntu or Mint, this is never an issue. Here is why:

  • the packages for the kernels are individually named (and managed) per the exact kernel version.
  • package for kernel-5.14.15 provides kernel 5.14.15
  • there is no “overlap” between the major version and the actual kernel releases involved
  • the directory for the kernel’s modules are in modules/5.14.15/
  • subsequent kernel updates leave kernel 5.14.15 alone, since it is currently running
  • if an update pulls 5.14.16, nothing happens in regards to the running kernel’s modules of modules/5.14.15/
  • all directories remain intact, until the user decides to clean up older kernels
  • this allows the user to use their system for longer without having to reboot in order to use the “correct directory” for the new modules

In regards to a distro like Arch or Manjaro, the rug is pulled from underneath your feet:

  • the packages for the kernels share the same base name (i.e, linux514)
  • package for linux514 provides kernel 5.14.15, then 5.14.16, then 5.14.20, etc…
  • there is “overlap” between the base version and the provided/managed kernel on the system
  • the directory for the kernel’s modules are in modules/5.14.15/
  • subsequent kernel updates (for linux514) removes modules/5.14.15/ while the system is running (unless using a pacman hook, explained below)
  • if an update pulls 5.14.16, it’s still considered an outright update of linux514
  • the rug is pulled from underneath your feet, attempting to do certain actions, as hinted by @openminded, and having experienced first-hand myself, breaks usage
  • there is an unspoken emphasis of forcing arbitrary reboots every time you update the kernel (when in fact it should be perfectly fine to keep using your system and reboot at a later time, without losing the folder that contains the modules for the real currently running kernel)
  • it’s not always feasible nor practical to reboot every single time you update the kernel; you’re currently doing stuff, have a setup going, and maybe would rather not lose the cache’d-to-RAM data which improves performance

This wouldn’t be an issue if linux514 was simply a metapackage that placed you on the "linux514 update train", while such updates simply pull in the packages linux-5.14.15, linux5.14.16, linux5.14.20, linux5.14.21, etc, as they become available as updates, for whichever happens to be the latest version at the time…

It would also remove the need for hooks and other attempted solutions, such as kernel-alive and kernel-modules-hook.

4 Likes

Well, there you have it: there was a wrong .old file, but the point is that this file shouldn’t even have to be there. It’s the script that puts it there, so as to know what to remove upon the next boot, and therefore, the script itself is violating the integrity of /usr as a read-only hierarchy.


Why would mounting an external drive require the loading of a kernel module? All of the modules required for mounting any filesystem would normally already reside in the kernel’s memory address space since the machine was booted. Among other things, that’s what the initramfs is for: it holds all of the required kernel modules.

Um, no, I don’t monitor Unstable update threads beyond the first two posts, and I religiously avoid anything to do with encrypted volumes and lvm2. :laughing:

You would think so. For some reason I have experienced all too often that I can no longer mount a network share, USB drive, etc, until I reboot. The system behaves strangely overall.

After installing kernel-modules-hook, the issues completely resolved. It just seems like a needless way to address the core issue, which I explained in my earlier post.

Well at least kernel-modules-hook is a Community package, which means that it doesn’t come installed by default.

kernel-alive is in the Core repo (now), which means that it does come installed by default in new installations, and that’s where the problem sits. A package that comes installed by default should not expect /usr to be writable.

This is why I proposed the following, which makes any hacky workarounds a moot point. It’s something Debian/Ubuntu/Mint do properly, in my opinion:

4 Likes

Yes, and that’s also how PCLinuxOS does it, and that’s also a rolling-release distribution. But with Manjaro’s upstream being Arch and Arch doing it the way it does, chances are huge that the Manjaro developers aren’t going to step away from the Arch method of updating kernels. :slightly_frowning_face:

2 Likes

Did not know that! I wonder if they made that decision early on in their distro’s history? It’s a smart choice. :slight_smile:

So then we put pressure on upstream. I’m already drafting a formal letter to the Arch Linux package maintainers, of which I took the liberty to sign on both of our behalf. I’m just as committed as you are, @Aragorn. :muscle: We’re in this together!

I’m going to end the letter like so:


And that’s why you’re DOING IT WRONG.

With all due respect to the everyone at the arrogant and short-sighted Arch Linux team,

Sincerely,
~Aragorn (Come and find me! I dare you.)
Let me spell it for you: A-R-A-G-O-R-N (The hooded, pipe-smoking, penguin wizard with a nasty attitude.)
And winnie too. But this was mostly Aragorn’s idea.