Graphical interface randomly freezes

Hello !

I often encounter a freeze of the graphical interface, but the system itself is not frozen.
More precisely : I still can use every program, and the mouse cursor is still movable, it even continues to adapt to the inputs (text pointer, mouse pointer, link pointer, etc). But everything else is frozen. I have to reboot every time it happens.

It also happens on live ISOs. It happened on EndeavourOS Live ISO, (using Xfce), and it still happens to my Manjaro install (using KDE)

When it happens, I have a short black screen, then the display goes back, frozen, with sometimes some minor artifacts glitches.

Here’s the output of inxi -Fxxc0z --no-host

  Kernel: 5.15.12-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0
    Desktop: KDE Plasma 5.23.4 tk: Qt 5.15.2 wm: kwin_x11 dm: SDDM
    Distro: Manjaro Linux base: Arch Linux
Machine:
  Type: Desktop System: CSL- & KG product: 5816 v: N/A
    serial: <superuser required>
  Mobo: ASUSTeK model: TUF GAMING B450-PLUS II v: Rev X.0x
    serial: <superuser required> UEFI: American Megatrends v: 3002
    date: 03/11/2021
CPU:
  Info: 6-core model: AMD Ryzen 5 3600 bits: 64 type: MT MCP arch: Zen 2
    rev: 0 cache: L1: 384 KiB L2: 3 MiB L3: 32 MiB
  Speed (MHz): avg: 2367 high: 3599 min/max: 2200/4208 boost: enabled
    cores: 1: 3052 2: 2055 3: 2056 4: 2200 5: 2193 6: 2199 7: 3599 8: 2055
    9: 2396 10: 2199 11: 2200 12: 2200 bogomips: 86430
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Graphics:
  Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT / 6800M] vendor: ASUSTeK
    driver: amdgpu v: kernel bus-ID: 09:00.0 chip-ID: 1002:73df
  Device-2: Sunplus Innovation Full HD webcam type: USB
    driver: snd-usb-audio,uvcvideo bus-ID: 1-6:3 chip-ID: 1bcf:2284
  Display: x11 server: X.org 1.21.1.2 compositor: kwin_x11 driver:
    loaded: amdgpu,ati unloaded: modesetting,radeon alternate: fbdev,vesa
    resolution: <missing: xdpyinfo>
  Message: Unable to show advanced data. Required tool glxinfo missing.
Audio:
  Device-1: AMD Navi 21 HDMI Audio [Radeon RX 6800/6800 XT / 6900 XT]
    driver: snd_hda_intel v: kernel bus-ID: 09:00.1 chip-ID: 1002:ab28
  Device-2: AMD Starship/Matisse HD Audio vendor: ASUSTeK
    driver: snd_hda_intel v: kernel bus-ID: 0b:00.4 chip-ID: 1022:1487
  Device-3: Elgato Systems Elgato Wave:3 type: USB driver: snd-usb-audio
    bus-ID: 1-1:2 chip-ID: 0fd9:0070
  Device-4: Sunplus Innovation Full HD webcam type: USB
    driver: snd-usb-audio,uvcvideo bus-ID: 1-6:3 chip-ID: 1bcf:2284
  Sound Server-1: ALSA v: k5.15.12-1-MANJARO running: yes
  Sound Server-2: JACK v: 1.9.19 running: no
  Sound Server-3: PulseAudio v: 15.0 running: yes
  Sound Server-4: PipeWire v: 0.3.42 running: yes
Network:
  Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    vendor: ASUSTeK driver: r8169 v: kernel port: f000 bus-ID: 04:00.0
    chip-ID: 10ec:8168
  IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
Drives:
  Local Storage: total: 3.18 TiB used: 36.46 GiB (1.1%)
  ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 980 1TB size: 931.51 GiB
    speed: 31.6 Gb/s lanes: 4 serial: <filter> temp: 35.9 C
  ID-2: /dev/nvme1n1 vendor: Kingston model: SA2000M8500G size: 465.76 GiB
    speed: 31.6 Gb/s lanes: 4 serial: <filter> temp: 34.9 C
  ID-3: /dev/sda vendor: Seagate model: ST2000DM001-9YN164 size: 1.82 TiB
    speed: 6.0 Gb/s serial: <filter>
Partition:
  ID-1: / size: 457.09 GiB used: 36.46 GiB (8.0%) fs: ext4
    dev: /dev/nvme1n1p2
  ID-2: /boot/efi size: 299.4 MiB used: 288 KiB (0.1%) fs: vfat
    dev: /dev/nvme1n1p1
Swap:
  ID-1: swap-1 type: file size: 512 MiB used: 0 KiB (0.0%) priority: -2
    file: /swapfile
Sensors:
  System Temperatures: cpu: N/A mobo: N/A gpu: amdgpu temp: 52.0 C
    mem: 54.0 C
  Fan Speeds (RPM): N/A gpu: amdgpu fan: 0
Info:
  Processes: 339 Uptime: 16m Memory: 15.6 GiB used: 4.63 GiB (29.7%)
  Init: systemd v: 250 Compilers: gcc: 11.1.0 Packages: pacman: 1278
  Shell: Zsh v: 5.8 running-in: konsole inxi: 3.3.11

Thanks for your help guys!

Without a log, there’s only so much to hypothesize.

However, if this happens also in other distros and live environments, I would guess it’s a hardware problem, most likely faulty RAM.

2 Likes

Indeed, but I don’t know where to find such logs.
I’ll do a memtest tonight to check if it comes from there.

So it seems it comes from my GPU, when I checked the journalctl's logs:

janv. 10 10:33:13 guilhem-manjaro kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1040827, emitted seq=1040829
janv. 10 10:33:13 guilhem-manjaro kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 943 thread Xorg:cs0 pid 1028
janv. 10 10:33:13 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: GPU reset begin!
janv. 10 10:33:17 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: failed to suspend display audio
janv. 10 10:33:17 guilhem-manjaro kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
janv. 10 10:33:17 guilhem-manjaro kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
janv. 10 10:33:18 guilhem-manjaro kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
janv. 10 10:33:18 guilhem-manjaro kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
janv. 10 10:33:18 guilhem-manjaro kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
janv. 10 10:33:18 guilhem-manjaro kernel: [drm] free PSP TMR buffer
janv. 10 10:33:18 guilhem-manjaro kernel: amdgpu 0000:09:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0013 address=0xf7d00200100 flags=0x0030]
janv. 10 10:33:18 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: MODE1 reset
janv. 10 10:33:18 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: GPU mode1 reset
janv. 10 10:33:18 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: GPU smu mode1 reset
janv. 10 10:33:18 guilhem-manjaro kernel: snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
janv. 10 10:33:18 guilhem-manjaro kernel: snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
janv. 10 10:33:18 guilhem-manjaro kernel: snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
janv. 10 10:33:18 guilhem-manjaro kernel: snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
janv. 10 10:33:18 guilhem-manjaro kernel: snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
janv. 10 10:33:18 guilhem-manjaro kernel: snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
janv. 10 10:33:18 guilhem-manjaro kernel: snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
janv. 10 10:33:18 guilhem-manjaro kernel: snd_hda_intel 0000:09:00.1: spurious response 0x0:0x0, last cmd=0x1f0500
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: GPU reset succeeded, trying to resume
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] PCIE GART of 512M enabled (table at 0x00000080007E9000).
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] VRAM is lost due to GPU reset!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] PSP is resuming...
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] reserve 0xa00000 from 0x82fe000000 for PSP TMR
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: RAS: optional ras ta ucode is not available
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: SMU is resuming...
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: SMU is resumed successfully!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] DMUB hardware initialized: version=0x02020003
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] kiq ring mec 2 pipe 1 q 0
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] JPEG decode initialized successfully.
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 1
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 1
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 1
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: recover vram bo from shadow start
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: recover vram bo from shadow done
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu 0000:09:00.0: amdgpu: GPU reset(2) succeeded!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm] Skip scheduling IBs!
janv. 10 10:33:19 guilhem-manjaro kernel: amdgpu_cs_ioctl: 42 callbacks suppressed
janv. 10 10:33:19 guilhem-manjaro kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
janv. 10 10:33:19 guilhem-manjaro kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

My GPU works fine on Windows 10, I’ve never encountered such freeze.
So I guess it comes from the drivers? Do you think I should try the AMDGPU-PRO drivers?

Thanks for your help ;_;

What is the output of journalctl -p 3?

1 Like

Try to disable C-state c6 or PBO in BIOS setting of motherboard.

1 Like

Hey @Zesko,
Here’s the output of my journalctl -p 3:
https://pastebin.com/UBXBEjSF

It happens everyday, so, you can just check the logs of January 10th.

Try to disable C-state c6 or PBO in BIOS setting of motherboard.

Allright, I’ll try to find those settings and keep you informed.

Thanks guys :slight_smile:

Housekeeping … no activity …