Booting to black after graphics driver update. Recurrent problem fixed with mhwd. Why does this happen?

I’ve been meaning to post this a while.

Several times after installing updates via the GUI update manager, when the graphics driver has been updated, this has resulted in a system which boots to black.

I have fixed this by booting into single user mode and then using mhwd to remove and then reinstall the graphics driver.

This has happened probably 4-5 times on multiple machines in the house so is pretty annoying, especially since one of these is a tablet attached to a wall and it is a hassle getting a keyboard attached. Even worse is when it happens on a PC hundreds of miles away at my parents house, after they simply run an update.

What I’d like to understand is why this is happening, and how this can be fixed by the Manjaro team so that it stops happening. I am hesitant now to recommend Manjaro to non-technical people as I expect this problem will just occur to them. The update procedure should never result in a system which looks, for all intents and purposes, to a non-technical person as completely broken.

Do you guys know what I’m talking about? Anyone else get this issue periodically?

Ashley

Not much to say without knowing the affected hardware.

1 Like

It seems to be on any machine I install on. I’ve had the same problem on:

  1. A Samsung NP900X4C which has a Intel HD 3000 graphics
  2. This machine, a Ryzen 7 with an Nvidia GTX 650
  3. A Lenovo tablet with an Intel Atom Z3795 processor having integrated Intel HD graphics
  4. an i7 laptop with some nvidia chip (this is the one hundreds of miles away so I need to check to get the exact HW)

I think this is a general issue with the way that the graphics updates are done, somehow there are situations where it can completely break.

I’m just going to follow up here because I managed to login to the remote machine (machine 4 in the above list) by getting my sister to setup a remote tunnel via a server, and I fixed it in the following manner:

There was no package for linux513-nvidia but the machine was running linux513

So I installed linux515 and then used mhwd -a auto nonfree 0300 to install the nvidia drivers

All my sister did was follow the standard update dialog. And this is the same issue I’ve had before.

Can someone explain how this happens? Why doesn’t the update process prompt to change kernels if upgrading the system is going to break the graphics driver? Or is it because you want to decouple these two things? It’s not very intuitive for a non-technical.

As I’ve said, this keeps happening to me, once or twice a year on at least one of the machines. I believe update should just work. My parents should just be able to click update and not expect this kind of issue. Well that’s what I think would be good for Manjaro anyway.

Ashley

:100:

Some tips:

  • Always look for the anouncements of updates before updating !
  • Always have 2 kernels installed
  • One of them LTS
  • Remove kernels when they become EOL and replace them before something breaks
  • Backup !!!

Install 2 kernels (one of them LTS)

https://wiki.manjaro.org/index.php/Manjaro_Kernels

Backup:

1 Like

Kernels have a lifespan of 3 months, and thus need to be upgraded regularly. You can lower the need of upgrading it by using LTS kernels, which have a lifespan of 6 years.


I almost forgot. MSM should be installed by default, and AFAIK its default configuration should warn you whenever you run an EOL kernel.

1 Like

You are mismanaging your system (running end of life kernels) … thats why.

When it comes to ‘issues’ like this … I guess the only option is mass mind-control, cuz apparently all the documentation in the world doesnt help.

Such a glib response. I hope it titillates you as much as it looks as though it does. 5+ cool points for you.

The marketing material (Manjaro - Linux Beginners) says it is suitable for beginners.

I’m not a beginner, but the computers I’m installing it on belong to people that are.

I’ve tried to highlight an issue here, but your feelings have got hurt because you think I’m attacking your favourite OS.

If the system is designed so that “update” can break the system then it’s a broken system. If you can’t see that, then I think we’re done.

My feelings arent hurt …

Its just a simple thing - you cannot run out of date and unsupported software and expect things to keep working.

And this issue in particular? You arent alone … but that means that the little search button up at the top would return at least ~20 or more threads with these exact symptoms … and low-and-behold … the same answers.

Okey doke.

I’ll admit, I was annoyed before by Mr cscs and rage quit the thread. You got me bro, good trolling. But I’ll go ahead and try and think how this can be solved for the broader user base, since that was my initial motivation, not solving the problem for myself which is trivial.

I need to go back to why the system breaks in the first place. Please try and help me understand this, without stupid sarcastic comments, if you can just refrain from that for a while that would be useful.

I need to understand this in detail, and I don’t want to be confused, so I’ll start with some simple statements which I believe to be true:

  1. Only patch updates are applied automatically to the kernel according to the documentation.
  2. The graphics kernel module is specific to the kernel version.
  3. The graphics kernel module is installed via a package rather than dynamically generated via DKMS

Are these statements correct?

So I’m asking myself, what sequence of updates leads to a broken system, and the following questions present themselves:

  1. If the kernel is automatically patched, is it possible that a graphics module is not available for that patched version?
  2. If a user updates the kernel manually to a new version, on the next system update will the corresponding graphics module be installed?

Ashley

By default yes.


Theoretically, as long as both the kernel version and the graphics module are maintained, yes.
But then, if the GPU manufacturer stop maintaining their old proprietary driver, it thus won’t receive updates while the kernel still does, which can lead to issues. Such cases are usually announced so that users can switch to another driver.
Likewise, when a kernel reaches EOL, the associated drivers are also dropped. But when trying to update the rest, including still maintained kernels, such update may fail due to broken requirements.


If you install only the kernel, through the package manager, then no.
It is recommended to use msm or mhwd-kernel for installing a kernel, as those will install the drivers at the same time.
https://wiki.manjaro.org/index.php/Manjaro_Kernels/en

Thanks, that’s very helpful, I feel we are getting somewhere. Please continue to help me through this.

If a kernel reaches EOL, you say the drivers are dropped. But the user can carry on using that kernel.

So in the case above the computer was running linux513, and when this became EOL, presumably linux513-nvidia dropped out of the extra repository?

If both linux513 and linux513-nvidia are installed however, the system graphics should not break, right?

So which system component removed linux513-nvidia? Was it the automatic update process?

Ashley

If you check linux*-nvidia’s dependencies, you can see it does not only needs the associated kernel version, but also packages that can keep on updating “independently” from the kernel. And as dependency requirements shall stay valid, the package manager may need to remove the package in order to update the dependencies.
Beside this, since those dependencies can still be updated, they may do so beyond the compatibility with the dropped packages, which then may not work correctly.

Thanks, this still isn’t clear to me.

Looking at the depends of linux515-nvidia as an example (is there a way to look at the old repos?), the following are listed:

linux515
nvidia-utils=495.46

Presumably the package manager isn’t going to automatically remove linux515 if that’s the running kernel, so then nvidia-utils=495.46 would be the only other package.

Are you saying that nvidia-utils being updated could somehow result in the removal of linux-nvidia515?

Exactly.
As updating nvidia-utils may break the dependency for linux513-nvidia, and since the latter is no longer in the repository, the package manager will logically suggest to remove linux513-nvidia.

The best you can do is to ensure the remote system is running on an LTS kernel.

For the time being this is linux515.

The issues you have faced is due to the 5.13 kernel and kernel modules has been decommissioned entirely.

A kernel upgrade - such as this one 5.13 to 5.15 - on a nvidia based system has to be done hands on. There is no easy way - my personal recommendation is that such updates needs to be done in console - no gui as xorg depends on graphics which are going to be replaced - it usually presents a challenge.

Ok so I think we have arrived at the root cause of the issue.

I’m happy to do these upgrades manually but it doesn’t solve the issue for “random relative I’ve recommended Manjaro to”

A kernel reaching EOL should perhaps be treated as a special case and the system automatically updated to the latest LTS variant?

There have been experiments to automatically install newer kernels. They didn’t work as well as expected.

I guess there is always a chance something will go wrong. If mhwd-kernel was instrumented through the GUI this could work.

It only needs to be triggered in the specific circumstance that a kernel reaches EOL and then the user can be guided through it. Maybe it fails, but if it is going to fail anyway due to the graphics module getting bumped, then it’s probably worth it.

I’m thinking from the perspective of your naive user who just downloads the ISO, installs, and expects the system to keep itself upto date. Eventually that user is going to encounter the EOL kernel issue if they do nothing but use the GUI update procedure. It would be nice if this case was handled. It happens a lot according to a forum search.


:warning: Manjaro boasts for being user-friendly, not fool-proof.
It secures quite a lot of things – kernels, drivers, branches, installation… – but it still is a cutting-edge, rolling-release distribution. From a user perspective, it is rather easy to maintain – mainly, one shall (almost) only need to follow the update announcements – but it is definitely not an install-and-forget OS.
:bookmark_tabs:

1 Like