Hi, everyone, I think this issue is very bizarre. From beginning, I have no problems. I had 6800xt radeon in my build and running my python rocm based apps for more than year, without any issues. Only, some moths ago started to get issues. So I started to experiment, and after Ubuntu server headless installation, problems was eliminated. But in place I built new pc to work with, and for sure by default I choose Manjaro, same hardware config, but with W7800 amd gpu and same issues again started to pop out. I had steam running in background this time, and somehow every time that is different process or app. I will describe as direct it’s possible what I seen:
Without any previous red flags got this one:
kernel: amdgpu 0000:08:00.0: amdgpu: Dumping IP State
kernel: amdgpu 0000:08:00.0: amdgpu: Dumping IP State Completed
kernel: amdgpu 0000:08:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=1178177, emitted seq=1178179
amdgpu: Process information: process steamwebhelper pid 42056 thread steamwebhe:cs0 pid 42059
And this one log record is the beginning of the end:
kernel: [drm] VRAM is lost due to GPU reset!
And In think I don’t need to post all log, but in loop there are this one:
kwin_wayland[2748]: kwin_scene_opengl: 0x2: GL_CONTEXT_LOST in context lost
And after while this dump is thrown, nothing is responding on screen. There are also stack trace, what I think is irrelevant.
kernel: amdgpu 0000:08:00.0: amdgpu: Failed to start rlc autoload
kernel: amdgpu 0000:08:00.0: amdgpu: PSP resume failed
kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22
kernel: amdgpu 0000:08:00.0: amdgpu: GPU reset(2) failed
systemd-coredump[50108]: Process 2859 (Xwayland) of user 1000 terminated abnormally with signal 6/ABRT, processing...
systemd[1]: Started Process Core Dump (PID 50108/UID 0).
kernel: amdgpu 0000:08:00.0: amdgpu: GPU reset end with ret = -22
kernel: amdgpu 0000:08:00.0: amdgpu: GPU Recovery Failed: -22
systemd-coredump[50109]: [🡕] Process 2859 (Xwayland) of user 1000 dumped core.
Module [dso] without build-id.
Stack trace of thread 2862:
#0 0x0000752446eb35e7 n/a (n/a + 0x0)
#1 0x00007524443e20e3 n/a (n/a + 0x0)
#2 0x00007524443e55b3 n/a (n/a + 0x0)
#3 0x0000752443edd8a4 n/a (n/a + 0x0)
#4 0x0000752443f1271d n/a (n/a + 0x0)
#5 0x0000752446f2370a n/a (n/a + 0x0)
#6 0x0000752446fa7aac n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
And this was last looping after that in log till I pull power plug:
kwin_wayland[2748]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug.
Little update. Today also got same issue with full screen of artifacts, but without kernel panic, music was playing all time in background, so I assume that something soft crash. I was able to resume using ctrl+alt+f3. There are kernel logs:
|11.3.2025 11.38|wlp4s0|associated|
|---|---|---|
|11.3.2025 11.38|wlp4s0|Limiting TX power to 23 (23 - 0) dBm as advertised by 78:9a:18:dd:da:78|
|11.3.2025 23.47|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.47|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.47|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=9024024, emitted seq=9024026|
|11.3.2025 23.47|amdgpu 0000|8:00.0: amdgpu: Process information: process plasma-systemmo pid 52135 thread plasma-sys:cs0 pid 52139|
|11.3.2025 23.49|EXT4-fs (sdc1)|mounted filesystem 6a13fd63-94c0-4a37-9c21-b967ebf1c047 r/w with ordered data mode. Quota mode: none.|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=9035996, emitted seq=9035998|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Process information: process plasma-systemmo pid 52676 thread plasma-sys:cs0 pid 52680|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: MES failed to respond to msg=RESET|
|11.3.2025 23.50|[drm|mdgpu_mes_reset_legacy_queue [amdgpu]] *ERROR* failed to reset legacy queue|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GPU reset begin!|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Failed to evict queue 3|
|11.3.2025 23.50|amdgpu|Failed to suspend process 0x801a|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: remove_all_queues_mes: Failed to remove queue 2 for dev 34581|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: MODE1 reset|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GPU mode1 reset|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GPU smu mode1 reset|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GPU reset succeeded, trying to resume|
|11.3.2025 23.50||[drm] PCIE GART of 512M enabled (table at 0x0000008002000000).|
|11.3.2025 23.50||[drm] VRAM is lost due to GPU reset!|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: PSP is resuming...|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: reserve 0x1300000 from 0x877c000000 for PSP TMR|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GECC is enabled|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: RAP: optional rap ta ucode is not available|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: SMU is resuming...|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: smu driver if version = 0x0000003d, smu fw if version = 0x00000040, smu fw program = 0, smu fw version = 0x004e8000 (78.128.0)|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: SMU driver if version not matched|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: SMU is resumed successfully!|
|11.3.2025 23.50||[drm] DMUB hardware initialized: version=0x07002A00|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring vcn_unified_1 uses VM inv eng 1 on hub 8|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring jpeg_dec uses VM inv eng 4 on hub 8|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 14 on hub 0|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: GPU reset(4) succeeded!|
|11.3.2025 23.50|[drm|mdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.50|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=9036011, emitted seq=9036014|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Process information: process kwin_wayland pid 1716 thread kwin_wayla:cs0 pid 1767|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.51|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: ring comp_1.2.0 timeout, but soft recovered|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.52|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.53|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: Dumping IP State Completed|
|11.3.2025 23.54|amdgpu 0000|8:00.0: amdgpu: ring gfx_0.0.0 timeout, but soft recovered|
Mod Edit: Fixed formatting