GPU blackout, crash

Hi, everyone, I think this issue is very bizarre. From beginning, I have no problems. I had 6800xt radeon in my build and running my python rocm based apps for more than year, without any issues. Only, some moths ago started to get issues. So I started to experiment, and after Ubuntu server headless installation, problems was eliminated. But in place I built new pc to work with, and for sure by default I choose Manjaro, same hardware config, but with W7800 amd gpu and same issues again started to pop out. I had steam running in background this time, and somehow every time that is different process or app. I will describe as direct it’s possible what I seen:

Without any previous red flags got this one:

kernel: amdgpu 0000:08:00.0: amdgpu: Dumping IP State
kernel: amdgpu 0000:08:00.0: amdgpu: Dumping IP State Completed
kernel: amdgpu 0000:08:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=1178177, emitted seq=1178179
amdgpu: Process information: process steamwebhelper pid 42056 thread steamwebhe:cs0 pid 42059

And this one log record is the beginning of the end:

kernel: [drm] VRAM is lost due to GPU reset!

And In think I don’t need to post all log, but in loop there are this one:

kwin_wayland[2748]: kwin_scene_opengl: 0x2: GL_CONTEXT_LOST in context lost

And after while this dump is thrown, nothing is responding on screen. There are also stack trace, what I think is irrelevant.

kernel: amdgpu 0000:08:00.0: amdgpu: Failed to start rlc autoload
kernel: amdgpu 0000:08:00.0: amdgpu: PSP resume failed
kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22
kernel: amdgpu 0000:08:00.0: amdgpu: GPU reset(2) failed
systemd-coredump[50108]: Process 2859 (Xwayland) of user 1000 terminated abnormally with signal 6/ABRT, processing...
systemd[1]: Started Process Core Dump (PID 50108/UID 0).
kernel: amdgpu 0000:08:00.0: amdgpu: GPU reset end with ret = -22
kernel: amdgpu 0000:08:00.0: amdgpu: GPU Recovery Failed: -22
systemd-coredump[50109]: [🡕] Process 2859 (Xwayland) of user 1000 dumped core.
Module [dso] without build-id.
                                                        Stack trace of thread 2862:
                                                        #0  0x0000752446eb35e7 n/a (n/a + 0x0)
                                                        #1  0x00007524443e20e3 n/a (n/a + 0x0)
                                                        #2  0x00007524443e55b3 n/a (n/a + 0x0)
                                                        #3  0x0000752443edd8a4 n/a (n/a + 0x0)
                                                        #4  0x0000752443f1271d n/a (n/a + 0x0)
                                                        #5  0x0000752446f2370a n/a (n/a + 0x0)
                                                        #6  0x0000752446fa7aac n/a (n/a + 0x0)
                                                        ELF object binary architecture: AMD x86-64

And this was last looping after that in log till I pull power plug:

kwin_wayland[2748]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug.

Little update. Today also got same issue with full screen of artifacts, but without kernel panic, music was playing all time in background, so I assume that something soft crash. I was able to resume using ctrl+alt+f3. There are kernel logs:

|11.3.2025 11.38|wlp4s0|associated|
|---|---|---|
|11.3.2025 11.38|wlp4s0|Limiting TX power to 23 (23 - 0) dBm as advertised by 78:9a:18:dd:da:78|
|11.3.2025 23.47|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.47|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.47|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=9024024, emitted seq=9024026|
|11.3.2025 23.47|amdgpu 0000|8:00.0: amdgpu: Process information: process plasma-systemmo pid 52135 thread plasma-sys:cs0 pid 52139|
|11.3.2025 23.49|EXT4-fs (sdc1)|mounted filesystem 6a13fd63-94c0-4a37-9c21-b967ebf1c047 r/w with ordered data mode. Quota mode: none.|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=9035996, emitted seq=9035998|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Process information: process plasma-systemmo pid 52676 thread plasma-sys:cs0 pid 52680|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: MES failed to respond to msg=RESET|
|11.3.2025 23.50|[drm|mdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GPU reset begin!|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Failed to evict queue 3|
|11.3.2025 23.50|amdgpu|Failed to suspend process 0x801a|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: remove_all_queues_mes: Failed to remove queue 2 for dev 34581|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: MODE1 reset|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GPU mode1 reset|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GPU smu mode1 reset|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GPU reset succeeded, trying to resume|
|11.3.2025 23.50||[drm] PCIE GART of 512M enabled (table at 0x0000008002000000).|
|11.3.2025 23.50||[drm] VRAM is lost due to GPU reset!|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: PSP is resuming...|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: reserve 0x1300000 from 0x877c000000 for PSP TMR|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GECC is enabled|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: RAP: optional rap ta ucode is not available|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: SMU is resuming...|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x004e8000 (78.128.0)|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: SMU driver if version not matched|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: SMU is resumed successfully!|
|11.3.2025 23.50||[drm] DMUB hardware initialized: version=0x07002A00|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring jpeg_dec uses VM inv eng 4 on hub 8|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GPU reset(4) succeeded!|
|11.3.2025 23.50|[drm|mdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=9036011, emitted seq=9036014|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Process information: process kwin_wayland pid 1716 thread kwin_wayla:cs0 pid 1767|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: ring comp_1.2.0 timeout, but soft recovered|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|

Mod Edit: Fixed formatting

Hi @krisx, and welcome!

Which kernel are you using? Also, since this is your first post:

In order for us, or anyone for that matter, to be able to provide assistance, more information is necessary. To that end, please see:

Please also note and heed: Forum Rules - Manjaro

Those with privacy concerns: note that when -z, or --filter is used, all personally identifiable information is filtered out from the resulting inxi output. :eyes:

Hope you manage!


:bangbang: Tip for legibility: :bangbang:

To provide terminal output, copy the text you wish to share, and paste it here, surrounded by three (3) backticks, a.k.a grave accents. Like this:

```
pasted text
```

Or three (3) tilde signs, like this:

~~~
pasted text
~~~

This will just cause it to be rendered like this:

Portaest sed
elementum
cursus nisl nisi
hendrerit ac quis
sit
adipiscing
tortor sit leo commodo.

Instead of like this:

Portaest sed elementum cursus nisl nisi hendrerit ac quis sit adipiscing tortor sit leo commodo.

Alternatively, paste the text you wish to format as terminal output, select all pasted text, and click the </> button on the taskbar. This will indent the whole pasted section with one TAB, causing it to render the same way as described above.

Thereby improving legibility and making it much easier for those trying to be of assistance.


:bangbang::bangbang: Additionally

If your language isn’t English, please prepend any and all terminal commands with LC_ALL=C. For example:

LC_ALL=C bluetoothctl

This will just cause the terminal output to be in English, making it easier to understand and debug.

Please edit your post accordingly.

Note that the above text is partially pre-prepared as a general introduction for new forum users. Please take the time to follow links given and learn how to create effective support requests and encourage quality responses.