Amdgpu driver crashing regularly on Vega56

Hi everyone,

Whenever I play some games for a long time my graphic driver just kind of crashes.
The first symptom is, that the computer suddenly stutters and hangs. After that the screen becomes black and then shows this (sorry I it seems as if I am not allowed to post links or images):

Here is my inxi -Fzxxxi:

[Kraut@Kraut ~]$ inxi -Fzxxxi
System: Kernel: 5.8.11-1-MANJARO x86_64 bits: 64 compiler: N/A Desktop: Xfce 4.14.2 tk: Gtk 3.24.20 info: xfce4-panel
wm: xfwm4 dm: LightDM 1.30.0 Distro: Manjaro Linux
Machine: Type: Desktop System: Gigabyte product: AX370-Gaming K5 v: N/A serial:
Mobo: Gigabyte model: AX370-Gaming K5-CF v: x.x serial: UEFI [Legacy]: American Megatrends v: F50d
date: 07/02/2020
CPU: Topology: 8-Core model: AMD Ryzen 7 1800X bits: 64 type: MT MCP arch: Zen rev: 1 L2 cache: 4096 KiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 115239
Speed: 1942 MHz min/max: 2200/3600 MHz boost: enabled Core speeds (MHz): 1: 2286 2: 1865 3: 1872 4: 1834 5: 1886
6: 1886 7: 1889 8: 1892 9: 1871 10: 1871 11: 1890 12: 1891 13: 1886 14: 1889 15: 2380 16: 1871
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] vendor: ASUSTeK driver: amdgpu
v: kernel bus ID: 0c:00.0 chip ID: 1002:687f
Display: x11 server: X . Org 1.20.9 driver: amdgpu,ati unloaded: modesetting alternate: fbdev,vesa resolution:
1: 1920x1080~60Hz 2: 1920x1080~60Hz s-dpi: 96
OpenGL: renderer: Radeon RX Vega (VEGA10 DRM 3.38.0 5.8.11-1-MANJARO LLVM 10.0.1) v: 4.6 Mesa 20.1.8
direct render: Yes
Audio: Device-1: Advanced Micro Devices [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] driver: snd_hda_intel v: kernel
bus ID: 0c:00.1 chip ID: 1002:aaf8
Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: Gigabyte driver: snd_hda_intel v: kernel
bus ID: 0e:00.3 chip ID: 1022:1457
Sound Server: ALSA v: k5.8.11-1-MANJARO
Network: Device-1: Intel I211 Gigabit Network vendor: Gigabyte driver: igb v: 5.6.0-k port: f000 bus ID: 05:00.0
chip ID: 8086:1539
IF: enp5s0 state: down mac:
Device-2: Broadcom and subsidiaries BCM4360 802.11ac Wireless Network Adapter vendor: ASUSTeK driver: wl v: kernel
port: f000 bus ID: 09:00.0 chip ID: 14e4:43a0
IF: wlp9s0 state: up mac:
IP v4: type: dynamic noprefixroute scope: global broadcast:
IP v6: type: dynamic noprefixroute scope: global
IP v6: type: dynamic noprefixroute scope: global
IP v6: type: noprefixroute scope: link
WAN IP: No WAN IP found. Connected to web? SSL issues? Try --no-dig
Drives: Local Storage: total: 4.38 TiB used: 619.55 GiB (13.8%)
ID-1: /dev/nvme0n1 vendor: SanDisk model: Extreme Pro 500GB size: 465.76 GiB speed: 31.6 Gb/s lanes: 4
serial: rev: 101200RL scheme: MBR
ID-2: /dev/sda vendor: Western Digital model: WD10EZEX-00BN5A0 size: 931.51 GiB speed: 6.0 Gb/s rotation: 7200 rpm
serial: rev: 1A01 scheme: MBR
ID-3: /dev/sdb vendor: Western Digital model: WD3200BPVT-00JJ5T0 size: 298.09 GiB speed: 3.0 Gb/s
rotation: 5400 rpm serial: rev: 1A01 scheme: MBR
ID-4: /dev/sdc type: USB vendor: Seagate model: ST3000DM001-1ER166 size: 2.73 TiB rotation: 7200 rpm
serial: rev: 0209 scheme: MBR
Partition: ID-1: / size: 457.45 GiB used: 259.27 GiB (56.7%) fs: ext4 dev: /dev/nvme0n1p1
ID-2: /opt size: 622.49 GiB used: 54.41 GiB (8.7%) fs: ext4 dev: /dev/sda1
Swap: Alert: No Swap data was found.
Sensors: System Temperatures: cpu: 44.6 C mobo: 28.0 C gpu: amdgpu temp: 41 C
Fan Speeds (RPM): cpu: 0 fan-1: 0 fan-3: 0 gpu: amdgpu fan: 1354
Voltages: 12v: N/A 5v: N/A 3.3v: 1.69 vbat: 1.65
Info: Processes: 396 Uptime: 47m Memory: 31.37 GiB used: 3.11 GiB (9.9%) Init: systemd v: 246 Compilers: gcc: 10.2.0
Packages: 1559 pacman: 1534 flatpak: 8 snap: 17 Shell: Bash v: 5.0.18 running in: server inxi: 3.1.05

Where can I find more logs on this issue?
thank you!

EDIT: here is my journalclt -b -1 (atleast the last minute, before it crashed) and I was able to link the pic now

OUTCH…

In general you can look up:

journalctl -b -0

or in follow mode:

journalctl -f

If it comes to the game log, it will depend if you run it in wine, proton or native.

Hi,

thank you for linking the pic!

as for the journalctl - is there a place where I can put it? It is 4800 lines long.

The game in question is Albion Online and happens in all 3 installations (pacman, lutris, Steam), but is not the only game it is happening to.
I am running it most of the time with feral game mode in lutris, but it also happens without it.
EDIT: put a pastebin into main post

I really have no idea, but a guess that it might be a reset if the amdgpu while gaming.

Here is a possible bug report: https://bugzilla.kernel.org/show_bug.cgi?id=206017

A possible workaround would be adding this:

amdgpu.noretry=0

to the kernel parameter in /etc/default/grub.

modinfo amdgpu

will display more information.

Just in case, have a manjaro install disk ready to boot and revert the changes.

thank you for this!

I changed this line in /etc/default/grub:
GRUB_CMDLINE_LINUX_DEFAULT="quiet apparmor=1 security=apparmor udev.log_priority=3 amdgpu.gttsize=8192 amdgpu.lockup_timeout=1000 amdgpu.gpu_recovery=1 amdgpu.noretry=0 amdgpu.ppfeaturemask=0xfffd3fff iommu=pt amdgpu.deep_color=1"

I will report back in a few days, if I still get the issue.

1 Like