amdgpu ring timeouts/resets with GpuTest (FurMark) on Manjaro kernel 6.18 (extra)
Summary
Running GpuTest (FurMark) on kernel 6.18 from extra causes amdgpu ring timeouts and device resets. The FurMark window appears but stays black, GPU load stays near idle, and the process is aborted (core dump). Mouse/desktop lag spikes occur during/after the resets. The same FurMark run in the “normal benchmark” path previously worked.
Environment
- Distro: Manjaro (kernel 6.18 from
extra)
- Kernel: 6.18.x (extra)
- GPU: AMD (amdgpu driver) — exact model not shown in logs; system hostname
sharkoon
- Mesa/AMDGPU stack: (please fill exact versions:
mesa --version, glxinfo -B)
- Display server: X11
- GpuTest:
/test=fur (FurMark), windowed
- Note: Kernel 6.17 had an unrelated BT stack issue; this report is about GPU resets on 6.18.
Steps to Reproduce
-
Install GpuTest (FurMark) and amdgpu stack (default Manjaro packages).
-
Run:
gputest /test=fur /width=1920 /height=1080 /msaa=2 /gpumon_terminal
(also reproducible at 1280x720; happens both standalone and when invoked via a stress script).
-
Observe the window: it opens but remains black; after ~seconds the process aborts, kernel logs show ring timeouts and resets. GPU power stays low (~20–30 W), fans 0 RPM (idle).
Expected Result
FurMark renders and drives the GPU to high load without amdgpu resets.
Actual Result
- Window stays black, minimal GPU load, then amdgpu ring timeouts and device resets.
- Process aborts (core dumped).
- System input lag spikes (mouse stalls briefly).
Kernel Log Excerpts (journalctl -k)
gputest /test=fur /width=1920 /height=1080 /msaa=2 /gpumon_terminal
(also reproducible at 1280x720; happens both standalone and when invoked via a stress script).
3. Observe the window: it opens but remains black; after ~seconds the process aborts, kernel logs show ring timeouts and resets. GPU power stays low (~20–30 W), fans 0 RPM (idle).
Expected Result
FurMark renders and drives the GPU to high load without amdgpu resets.
Actual Result
- Window stays black, minimal GPU load, then amdgpu ring timeouts and device resets.
- Process aborts (core dumped).
- System input lag spikes (mouse stalls briefly).
Kernel Log Excerpts (journalctl -k)or 6.17 to check for regression.
- Try without
/gpumon_terminal and with lower resolution (1280x720); still seeing resets on my system.
- Validate with current Mesa/AMDGPU stack versions.
Will switch back to 6.17 (Bluetooth initialisation problem, 6.16 works fine)
Nope, not in official repo
$ pamac search gputest
gputest 0.7.0-1 AUR
cross-platform GPU stress test and OpenGL benchmark. Contains
FurMark, TessMark
$ inxi -CG
CPU:
Info: 12-core model: AMD Ryzen Threadripper PRO 5945WX s bits: 64
type: MT MCP cache: L2: 6 MiB
Speed (MHz): avg: 1790 min/max: 413/4101 cores: 1: 1790 2: 1790 3: 1790
4: 1790 5: 1790 6: 1790 7: 1790 8: 1790 9: 1790 10: 1790 11: 1790 12: 1790
13: 1790 14: 1790 15: 1790 16: 1790 17: 1790 18: 1790 19: 1790 20: 1790
21: 1790 22: 1790 23: 1790 24: 1790
Graphics:
Device-1: Advanced Micro Devices [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900
XTX/7900 GRE/7900M] driver: amdgpu v: kernel
Display: wayland server: X.org v: 1.21.1.21 with: Xwayland v: 24.1.9
compositor: kwin_wayland driver: X: loaded: modesetting unloaded: radeon
dri: radeonsi gpu: amdgpu resolution: 5120x1440~120Hz
API: EGL v: 1.5 drivers: kms_swrast,radeonsi,swrast
platforms: gbm,wayland,x11,surfaceless,device
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 25.3.1-arch1.2
renderer: AMD Radeon RX 7900 XTX (radeonsi navi31 LLVM 21.1.6 DRM 3.64
6.18.1-1-MANJARO)
API: Vulkan v: 1.4.335 drivers: radv surfaces: N/A
Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
de: kscreen-console,kscreen-doctor gpu: amdgpu_top,corectrl wl: wayland-info
x11: xdpyinfo, xprop, xrandr
That is the result I get
No issues running it on my Plasma Wayland system:
kinfo
Operating System: Manjaro Linux
KDE Plasma Version: 6.5.4
KDE Frameworks Version: 6.20.0
Qt Version: 6.10.1
Kernel Version: 6.18.1-1-MANJARO (64-bit)
Graphics Platform: Wayland
Processors: 16 × AMD Ryzen 7 5800H with Radeon Graphics
Memory: 32 GiB of RAM (28.3 GiB usable)
Graphics Processor: AMD Radeon Graphics
I switched back to:
[steffen@sharkoon GHULbenchmark]$ uname -a
Linux sharkoon 6.17.11-1-MANJARO #1 SMP PREEMPT_DYNAMIC Sun, 07 Dec 2025 07:13:59 +0000 x86_64 GNU/Linux
everything worked fine again (apart of my USB BT adapter, but that’s another topic, so I ran my benchmark tests here.
Saw you released a new 6.17.11.2 Kernel, I will test this one out.
THX
amdgpu / kernel 6.18.x – GpuTest/FurMark instability
Description
I’ve narrowed the issue down and can now reproduce a clear difference between kernel 6.17.x and 6.18.x, specifically in the interaction between amdgpu and GpuTest/FurMark under heavy load (GHUL “Cooler Hellfire” full‑system stress).
-
Hardware:
-
CPU: AMD Ryzen 5 2600X
-
Mainboard: MSI B450M PRO‑VDH MAX (MS‑7A38, Rev. 8.0)
-
GPU: Radeon RX 9060 XT (amdgpu)
-
RAM: 2×8 GiB DDR4 @ 3200 MT/s (dual‑channel)
-
Software:
-
Distro: Manjaro (current)
-
Kernels: 6.18.x (problem), 6.17.11‑1 and 6.17.12‑1 (stable)
-
Driver: amdgpu (Manjaro default stack)
-
Xorg (standard configuration, no exotic tweaks)
-
GpuTest/FurMark from AUR (unchanged)
-
GHULbenchmark (open‑source benchmark/stress suite; just calls GpuTest via CLI)
Reproduction scenario
-
Boot the system with kernel 6.18.x.
-
Run GHULbenchmark with Hellfire/Cooler enabled (or run GpuTest/FurMark directly with high duration / full GPU load).
-
During the “Cooler Hellfire” test (simultaneous heavy load on CPU, RAM, GPU, and storage), a FurMark window is started.
-
After a short time:
-
the screen goes black,
-
Xorg freezes,
-
the kernel repeatedly tries to reset the GPU,
-
you see “device wedged, but recovered through reset”.
With kernel 6.17.11‑1 and 6.17.12‑1, the exact same GHUL runs including all Hellfire tests (especially the Cooler test with GpuTest) are fully stable – there are no amdgpu errors in the logs.
Log excerpt (kernel 6.18, journalctl -b -1, shortened)
amdgpu 0000:2b:00.0: amdgpu: Dumping IP State
amdgpu 0000:2b:00.0: amdgpu: Dumping IP State Completed
amdgpu 0000:2b:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
amdgpu 0000:2b:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
amdgpu 0000:2b:00.0: amdgpu: ring comp_1.1.1 timeout, signaled seq=72, emitted seq=73
amdgpu 0000:2b:00.0: amdgpu: Process Xorg pid 86265 thread Xorg:cs0 pid 86268
amdgpu 0000:2b:00.0: amdgpu: Starting comp_1.1.1 ring reset
amdgpu 0000:2b:00.0: amdgpu: reset compute queue (1:1:1)
amdgpu 0000:2b:00.0: amdgpu: Ring comp_1.1.1 reset succeeded
amdgpu 0000:2b:00.0: [drm] device wedged, but recovered through reset
amdgpu 0000:2b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:160 vmid:0 pasid:0)
amdgpu 0000:2b:00.0: amdgpu: in page starting at address 0x0000000000000000 from client 10
amdgpu 0000:2b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00040B40
amdgpu 0000:2b:00.0: amdgpu: Faulty UTCL2 client ID: CPC (0x5)
amdgpu 0000:2b:00.0: amdgpu: MORE_FAULTS: 0x0
amdgpu 0000:2b:00.0: amdgpu: WALKER_ERROR: 0x0
amdgpu 0000:2b:00.0: amdgpu: PERMISSION_FAULTS: 0x4
amdgpu 0000:2b:00.0: amdgpu: MAPPING_ERROR: 0x1
amdgpu 0000:2b:00.0: amdgpu: RW: 0x1
and shortly afterwards again, this time on the GFX ring:
amdgpu 0000:2b:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=8022560, emitted seq=8022561
amdgpu 0000:2b:00.0: amdgpu: Process Xorg pid 86265 thread Xorg:cs0 pid 86268
amdgpu 0000:2b:00.0: amdgpu: Starting gfx_0.0.0 ring reset
amdgpu 0000:2b:00.0: amdgpu: Ring gfx_0.0.0 reset succeeded
amdgpu 0000:2b:00.0: [drm] device wedged, but recovered through reset
User‑visible symptoms
-
Black screen while the GpuTest/FurMark window is (or should be) visible during Cooler Hellfire.
-
Xorg becomes unresponsive, desktop cannot be used.
-
Sometimes the system partially recovers after one of the amdgpu resets, sometimes a relogin or reboot is required.
Expected behavior
-
Under heavy combined load (compute + graphics), the GPU should not constantly trigger ring timeouts or GCVM_L2 protection faults.
-
GpuTest/FurMark should behave like on kernel 6.17.x: run normally and exit cleanly after the test duration, without amdgpu coredumps or “device wedged” states.
Actual behavior
-
With kernel 6.18.x, running GpuTest/FurMark under full load reliably produces:
-
ring comp_1.1.x timeout and ring gfx_0.0.0 timeout
-
multiple “Ring reset succeeded” attempts
-
[drm] device wedged, but recovered through reset
-
GCVM_L2_PROTECTION_FAULT_STATUS + mapping/permission faults
-
AMDGPU coredumps under /sys/class/drm/card1/device/devcoredump/data
-
With kernels 6.17.11‑1 and 6.17.12‑1 the identical scenario is fully stable (same GpuTest build, same workload, same user configuration).
Question for maintainers
-
Can you reproduce this on a similar RDNA card and kernel 6.18.x by running GpuTest/FurMark under full load (e.g. 60–180 s at max, ideally as part of a parallel CPU/RAM/IO stress scenario)?
-
If yes, does this look like an upstream amdgpu / drm/amd regression, or something Manjaro‑specific (patch set / config)?
If you need more data (full journalctl logs or the devcoredump blob from /sys/class/drm/...), I can provide that as well.
Moderator edit: In the future, please use proper formatting: [HowTo] Post command output and file content as formatted text