System restart, black screen, etc. on AMD

Hello,

since several weeks now, more exactly the update from 6.6.40-1-MANJARO to 6.6.44-1-MANJARO, I have random system resets and black screens on my AMD system, several times a day. This is a pretty common problem:

gitlab.freedesktop.org/drm/amd/-/issues/3576
gitlab.freedesktop.org/drm/amd/-/issues/3556
www.linuxquestions.org/questions/showthread.php?s=e36793a52a86725e565158946fdc4ec8&p=6522951#post6522951
forum.manjaro.org/t/fastest-way-to-downgrade-6-6-46-to-6-6-40/167116

(I cannot post links, so I post the URL)

A fix is already in the “master” branch of Linus repo. I have been regularly updating my Manjaro system, hoping for a kernel with the fix, but so far, no luck. So yesterday, I downgraded my kernel to 6.6.40-1-MANJARO by downloading and installing manually the package, and now I have a stable system again.

Having an unstable desktop system for so long is quite a big deal, and it impacts probably many people. So I wanted to raise attention on this :slight_smile:
I don’t know the process and policies in the Manjaro project with regards to kernel updates. Is there any estimation when the fix could be rolled out?

Thanks!

I am using AMD systems - both my desktop and one of my laptops is AMD based and I cannot recognize what you describe - but I don’t use 6.6.

Current Linux 6.6 versions as of 2024-09-14T22:00:00Z

 $ mbn info linux66 -q
Branch         : unstable
Name           : linux66
Version        : 6.6.51-1
Repository     : core
Build Date     : Thu 12 Sep 2024 13:36:42 
Packager       : Manjaro Build Server <build@manjaro.org>

Branch         : testing
Name           : linux66
Version        : 6.6.51-1
Repository     : core
Build Date     : Thu 12 Sep 2024 13:36:42 
Packager       : Manjaro Build Server <build@manjaro.org>

Branch         : stable
Name           : linux66
Version        : 6.6.47-1
Repository     : core
Build Date     : Mon 19 Aug 2024 10:25:48 
Packager       : Manjaro Build Server <build@manjaro.org>

Apart from being LTS - is there any other reason to use the Linux 6.6?

Manjaro uses the kernel sources from kernel.org.

But not every addition or improvement to the mainline kernel gets backported to LTS.

What gets backported is not decided by Manjaro Team.

Therefore - if you have issues with the current LTS kernel - i.e. a specific ‘fix’ in the mainline kernel is not backported - your only choice is to move to the latest stable kernel - which is Linux 6.10.

Linux 6.10 as of 2024-09-14T22:00:00Z

 $ mbn info linux610 -q
Branch         : unstable
Name           : linux610
Version        : 6.10.10-2
Repository     : core
Build Date     : Sat 14 Sep 2024 11:44:32 
Packager       : Manjaro Build Server <build@manjaro.org>

Branch         : testing
Name           : linux610
Version        : 6.10.10-2
Repository     : core
Build Date     : Sat 14 Sep 2024 11:44:32 
Packager       : Manjaro Build Server <build@manjaro.org>

Branch         : stable
Name           : linux610
Version        : 6.10.6-10
Repository     : core
Build Date     : Sat 24 Aug 2024 16:39:12 
Packager       : Manjaro Build Server <build@manjaro.org>

If you are getting blackscreen - it could be related to system configuration.

Please see → Black Screen with Plymouth - Cause and Solution - #2 by linux-aarhus

I’d suggest using the linux61 series for now. I too have had issues with certain versions of linux66 although (albeit I don’t have AMD) the current version seems fine.

Or, linux611? I’ve read about issues with the current 610.

I agree with @linux-aarhus I have multiple AMD systems, laptops and Desktops, no issues. Even when I was using 6.6.

Here are my four things to verify;

  1. Don’t assume hardware is not an issue, remove memory modules, clean terminals and pins, reinstall and run memory check. ( This one has bit me in the arse many times).
  2. Make sure amdgpu module is loaded in mkinitcpio, it should not be necessary, when issues arrive, it will ensure that the module is loaded correctly. Click here
  3. Create another user and verify that issue also occurs in new user.
  4. Try different DE ISO

Thanks for the replies. I am certain by first-hand experience that “6.6.40-1-MANJARO” is not impacted and “6.6.44-1-MANJARO” is. It is 100% reproducible. I tried also 6.11, specifically 6.11.0-rc4-7-MANJARO, and it was also impacted.

Alex Deucher wrote the following comment in

gitlab.freedesktop.org/drm/amd/-/issues/3556

only affects a subset of gfx10 based GPUs. It has no effect on gfx11 parts like phoenix of gfx9 parts in older APUs.

So, yes, you are right that it does not impact all AMD users. Only the ones using a “subset of gfx10 based GPUs”.

The problem has been introduced by:

github.com/torvalds/linux/commit/09a67694edd1f787c206cd2fd066b0ca37debe9f

This change made its way in many stable branches. Alex Deucher provided the following fix:

git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/patch/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c?id=9d2b75e27660d768d1088e3ad905dff8ac771417

But unfortunately, the fix is not available yet in Manjaro, hence my original message. For the moment, I just stick to 6.6.40-1-MANJARO, because it was the last kernel that worked for me before the trouble started, waiting for a working kernel through the stable channels.

For the record (if it’s worth anything) I had issues with 6.6.26-1 whereas the previous 6.6.25 was fine.

Subsequent versions also have been fine for me, having used 6.1 for a while in the interim.

So, hopefully, this will be fixed soon. Especially as Torvalds himself is apparently directly involved.

If it is - if I understand what you are saying - a part of the kernel source for Linux 6.6 - then it is a included in Manjaro linux66 as Manjaro build from upstream kernel source (not Arch)

While I am a coder - I am no kernel maintainer, I have next to no experience with C and C++ - I have a huge respect for those able to maintain the kernel code. My coding skills are more boring.

//EDIT
Mentioning this to Philip - he dug into it - and found that the following kernels has the patch

  • Linux 6.10.7
  • Linux 6.11rc5

Testing branch has the 6.10.10 and Linux 6.11.0 so either wait to next stable snap or switch to testing branch.

I did an analysis of all v6 tags to this date:

                        Bug  Fix Verdict       Begin of time window
v6.11                    Y    Y   OK		Sun Sep 15 16:57:56 2024
v6.11-rc5 -  v6.11-rc7   Y    Y   OK		Sun Aug 25 19:07:11 2024
v6.11-rc1 -  v6.11-rc4   Y    n  FAIL		Sun Jul 28 14:19:55 2024
						     
v6.10.7   -  v6.10.10    Y    Y   OK		Thu Aug 29 17:36:13 2024
v6.10.3   -  v6.10.6     Y    n  FAIL		Sat Aug 3 09:01:09 2024
v6.10     -  v6.10.2     n    n   OK
v6.10-rc1 -  v6.10-rc7   n    n   OK
						     
v6.9 *all*               n    n   OK
						     
v6.8 *all*               n    n   OK
						     
v6.7 *all*               n    n   OK
						     
[LTS]                        
v6.6.44   -  v6.6.51     Y    n  FAIL	   Sat Aug 3 08:54:42 2024
v6.6      -  v6.6.43     n    n   OK
						     
v6.2 *all*               n    n   OK
						     
[LTS]                        
v6.1.103  -  v6.1.110    Y    n  FAIL      Sat Aug 3 08:49:53 2024
v6.1      -  v6.1.102    n    n   OK
						     
v6.0 *all*               n    n   OK

Command used as basis:

git tag | grep v6 | sort -rV | while read t ; do echo $t ; git log --pretty=oneline $t | head -10000 | egrep "(Update wptr registers as well as doorbell|limit wptr workaround to)" ; echo ; done

Apparently it cannot be applied to 6.6 LTS.

1 Like

I guess the next step is now to check which valid upstream kernels are available in Manjaro, e.g. v6.11+, v6.11-rc5+ or v6.10.7+. What is the easiest way to do so?

@linux-aarhus What is this “mbn” command you used above?

The package manjaro-check-repos

See → patrick / manjaro-check-repos · GitLab

12:05:38 ○ [fh@tiger] ~
 $ mbn info linux66 -q
Branch         : unstable
Name           : linux66
Version        : 6.6.51-1
Repository     : core
Build Date     : Thu 12 Sep 2024 13:36:42 
Packager       : Manjaro Build Server <build@manjaro.org>
Branch         : testing
Name           : linux66
Version        : 6.6.51-1
Repository     : core
Build Date     : Thu 12 Sep 2024 13:36:42 
Packager       : Manjaro Build Server <build@manjaro.org>
Branch         : stable
Name           : linux66
Version        : 6.6.47-1
Repository     : core
Build Date     : Mon 19 Aug 2024 10:25:48 
Packager       : Manjaro Build Server <build@manjaro.org>


12:05:46 ○ [fh@tiger] ~
 $ mbn info linux610 -q
Branch         : unstable
Name           : linux610
Version        : 6.10.10-3
Repository     : core
Build Date     : Sun 15 Sep 2024 16:53:57 
Packager       : Manjaro Build Server <build@manjaro.org>
Branch         : testing
Name           : linux610
Version        : 6.10.10-3
Repository     : core
Build Date     : Sun 15 Sep 2024 16:53:57 
Packager       : Manjaro Build Server <build@manjaro.org>
Branch         : stable
Name           : linux610
Version        : 6.10.6-10
Repository     : core
Build Date     : Sat 24 Aug 2024 16:39:12 
Packager       : Manjaro Build Server <build@manjaro.org>


12:05:54 ○ [fh@tiger] ~
 $ mbn info linux611 -q
Branch         : unstable
Name           : linux611
Version        : 6.11.0-1
Repository     : core
Build Date     : Mon 16 Sep 2024 01:33:41 
Packager       : Manjaro Build Server <build@manjaro.org>
Branch         : testing
Name           : linux611
Version        : 6.11.0-1
Repository     : core
Build Date     : Mon 16 Sep 2024 01:33:41 
Packager       : Manjaro Build Server <build@manjaro.org>
Branch         : stable
Name           : linux611
Version        : 6.11.0rc4-7
Repository     : core
Build Date     : Sat 24 Aug 2024 10:40:58 
Packager       : Manjaro Build Server <build@manjaro.org>
1 Like

Thanks a lot! So it means that as of today, all kernels from the Manjaro stable branches are impacted. If I understand the Wiki corretly, switchting to testing would be a global change. That might introduce other issues, so I prefer to just stick to 6.6.40-1-MANJARO for the kernel.

It is a bit ironic that the stable branch is impacted for such a long time with regard to this issue, but I guess that we can only wait.

That is correct - do realize that while the wiki raise awareness - it is not dangerous - and in the past 10 years I have had fewer issues with Linux than I had in the previous 20 years with Microsoft.

Regressions happen - there is really nothing to do about it.

Well - that is a choice only you can make.

I know - I wouldn’t hesitate in switching branch if it would solve an ongoing issue - it is one of the reasons I began using Arch and since Manjaro - the rolling release model.

4 Likes