System Freezes randomly

Hey guys! Hope y’all are doing well.

After installing the update from 04/09 my system started crashing randomly. It really does not have a pattern - I can be playing, scrolling through Facebook, coding, etc; It just freezes. No commands will work, have to hard reset/power down the hole computer.

At first, I thought the problem was related to some Iommu errors (like this one)

pci 0000:00:00.2: AMD-Vi: Unable to read/write to IOMMU perf counter.

But then I realized those are related boot errors - so its probably not related to the actual problem.

I’ve tried a lot of things, like

  • undo the update
  • reinstall the hole system
  • changed GRUB’s IOMMU related values (GRUB_CMDLINE_LINUX=β€œiommu=pt”) - before realizing it wasn’t related
  • use the Linux 5.4 kernel instead of 5.10

And still, it will randomly freeze. Using 5.4 kernel helps, idk why. I really appreciate if someone can take sometime to help me out. I usually don’t open threads like these I’m losing my mind with this problem.

Tks a lot!

Here are the logs for my last freeze:

abr 15 13:40:40 the-machine dbus-daemon[591]: [system] Failed to activate service 'org.bluez': timed out (service_start_timeout=25000ms)
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32774, for process firefox pid 94303 thread firefox:cs0 pid 94347)
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:   in page starting at address 0x000080010fc01000 from client 27
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00241051
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MORE_FAULTS: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          WALKER_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          PERMISSION_FAULTS: 0x5
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MAPPING_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          RW: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32774, for process firefox pid 94303 thread firefox:cs0 pid 94347)
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:   in page starting at address 0x000080010fc00000 from client 27
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00241051
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MORE_FAULTS: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          WALKER_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          PERMISSION_FAULTS: 0x5
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MAPPING_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          RW: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32774, for process firefox pid 94303 thread firefox:cs0 pid 94347)
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:   in page starting at address 0x000080010fc07000 from client 27
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00241051
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MORE_FAULTS: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          WALKER_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          PERMISSION_FAULTS: 0x5
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MAPPING_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          RW: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32774, for process firefox pid 94303 thread firefox:cs0 pid 94347)
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:   in page starting at address 0x000080010fc03000 from client 27
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00241051
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MORE_FAULTS: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          WALKER_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          PERMISSION_FAULTS: 0x5
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MAPPING_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          RW: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32774, for process firefox pid 94303 thread firefox:cs0 pid 94347)
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:   in page starting at address 0x000080010fc02000 from client 27
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00241051
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MORE_FAULTS: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          WALKER_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          PERMISSION_FAULTS: 0x5
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MAPPING_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          RW: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32774, for process firefox pid 94303 thread firefox:cs0 pid 94347)
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:   in page starting at address 0x000080010fc05000 from client 27
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00241051
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MORE_FAULTS: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          WALKER_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          PERMISSION_FAULTS: 0x5
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MAPPING_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          RW: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32774, for process firefox pid 94303 thread firefox:cs0 pid 94347)
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:   in page starting at address 0x000080010fc04000 from client 27
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00241051
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MORE_FAULTS: 0x1
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          WALKER_ERROR: 0x0
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          PERMISSION_FAULTS: 0x5
abr 15 13:41:03 the-machine kernel: amdgpu 0000:06:00.0:          MAPPING_ERROR: 0x0
inxi -Fza
System:    Kernel: 5.4.108-1-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.0 
           parameters: BOOT_IMAGE=/boot/vmlinuz-5.4-x86_64 root=UUID=1991e2ed-da17-4c3e-823e-37c85340ed96 rw iommu=pt quiet 
           splash apparmor=1 security=apparmor resume=UUID=97e5678e-9e32-48a7-8118-af3a599297a6 udev.log_priority=3 
           Desktop: GNOME 3.38.4 tk: GTK 3.24.28 wm: gnome-shell dm: GDM 3.38.2.1 Distro: Manjaro Linux base: Arch Linux 
Machine:   Type: Desktop Mobo: ASUSTeK model: EX-A320M-GAMING v: Rev X.0x serial: <filter> UEFI: American Megatrends v: 5220 
           date: 09/12/2019 
CPU:       Info: Quad Core model: AMD Ryzen 3 3200G with Radeon Vega Graphics bits: 64 type: MCP arch: Zen/Zen+ note: check 
           family: 17 (23) model-id: 18 (24) stepping: 1 microcode: 8108109 cache: L2: 2 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 28756 
           Speed: 1233 MHz min/max: 1400/3600 MHz boost: enabled Core speeds (MHz): 1: 1233 2: 2850 3: 2820 4: 2956 
           Vulnerabilities: Type: itlb_multihit status: Not affected 
           Type: l1tf status: Not affected 
           Type: mds status: Not affected 
           Type: meltdown status: Not affected 
           Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl and seccomp 
           Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization 
           Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional, STIBP: disabled, RSB filling 
           Type: srbds status: Not affected 
           Type: tsx_async_abort status: Not affected 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Picasso vendor: ASUSTeK driver: amdgpu v: kernel bus-ID: 06:00.0 
           chip-ID: 1002:15d8 class-ID: 0300 
           Display: wayland server: X.org 1.20.10 compositor: gnome-shell driver: loaded: amdgpu 
           note: n/a (using device driver) - try sudo/root display-ID: 0 resolution: <missing: xdpyinfo> 
           OpenGL: renderer: AMD Radeon Vega 8 Graphics (RAVEN DRM 3.35.0 5.4.108-1-MANJARO LLVM 11.1.0) v: 4.6 Mesa 21.0.1 
           direct render: Yes 
Audio:     Device-1: Advanced Micro Devices [AMD/ATI] Raven/Raven2/Fenghuang HDMI/DP Audio vendor: ASUSTeK 
           driver: snd_hda_intel v: kernel bus-ID: 06:00.1 chip-ID: 1002:15de class-ID: 0403 
           Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: ASUSTeK driver: snd_hda_intel v: kernel 
           bus-ID: 06:00.6 chip-ID: 1022:15e3 class-ID: 0403 
           Sound Server-1: ALSA v: k5.4.108-1-MANJARO running: yes 
           Sound Server-2: JACK v: 0.125.0 running: no 
           Sound Server-3: PulseAudio v: 14.2 running: yes 
           Sound Server-4: PipeWire v: 0.3.24 running: yes 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: ASUSTeK driver: r8169 v: kernel port: f000 
           bus-ID: 04:00.0 chip-ID: 10ec:8168 class-ID: 0200 
           IF: enp4s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
Drives:    Local Storage: total: 689.34 GiB used: 22.07 GiB (3.2%) 
           SMART Message: Required tool smartctl not installed. Check --recommends 
           ID-1: /dev/sda maj-min: 8:0 vendor: Kingston model: SA400S37120G size: 111.79 GiB block-size: physical: 512 B 
           logical: 512 B speed: 6.0 Gb/s rotation: SSD serial: <filter> rev: 0004 scheme: MBR 
           ID-2: /dev/sdb maj-min: 8:16 vendor: Kingston model: SV300S37A120G size: 111.79 GiB block-size: physical: 512 B 
           logical: 512 B speed: 6.0 Gb/s rotation: SSD serial: <filter> rev: BBF0 scheme: GPT 
           ID-3: /dev/sdc maj-min: 8:32 vendor: Seagate model: ST500LM012 HN-M500MBB size: 465.76 GiB block-size: 
           physical: 4096 B logical: 512 B speed: 3.0 Gb/s rotation: 5400 rpm serial: <filter> rev: 0002 scheme: MBR 
Partition: ID-1: / raw-size: 51.89 GiB size: 50.78 GiB (97.85%) used: 12.78 GiB (25.2%) fs: ext4 dev: /dev/sda2 maj-min: 8:2 
           ID-2: /boot/efi raw-size: 512 MiB size: 511 MiB (99.80%) used: 308 KiB (0.1%) fs: vfat dev: /dev/sda1 maj-min: 8:1 
           ID-3: /home raw-size: 51.89 GiB size: 50.78 GiB (97.85%) used: 9.29 GiB (18.3%) fs: ext4 dev: /dev/sda3 
           maj-min: 8:3 
Swap:      Kernel: swappiness: 60 (default) cache-pressure: 100 (default) 
           ID-1: swap-1 type: partition size: 7.5 GiB used: 0 KiB (0.0%) priority: -2 dev: /dev/sda4 maj-min: 8:4 
Sensors:   System Temperatures: cpu: 34.4 C mobo: N/A gpu: amdgpu temp: 34.0 C 
           Fan Speeds (RPM): N/A 
Info:      Processes: 255 Uptime: 42m wakeups: 0 Memory: 13.6 GiB used: 3.12 GiB (22.9%) Init: systemd v: 247 tool: systemctl 
           Compilers: gcc: 10.2.0 Packages: 1231 pacman: 1228 lib: 301 flatpak: 0 snap: 3 Shell: Zsh v: 5.8 
           running-in: gnome-terminal inxi: 3.3.03

GPU Data:

   description: VGA compatible controller
   product: Picasso
   vendor: Advanced Micro Devices, Inc. [AMD/ATI]
   physical id: 0
   bus info: pci@0000:06:00.0
   version: c9
   width: 64 bits
   clock: 33MHz
   capabilities: pm pciexpress msi msix vga_controller bus_master cap_list rom
   configuration: driver=amdgpu latency=0
   resources: irq:61 memory:e0000000-efffffff memory:f0000000-f01fffff ioport:e000(size=256) memory:fcc00000-fcc7ffff memory:c0000-dffff

drivers

06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Picasso (rev c9)
	Subsystem: ASUSTeK Computer Inc. Device 876b
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu
2 Likes

Try going to your BIOS β†’ Power Supply Idle β†’ Change β€œAuto” to β€œTypical”

1 Like

Can confirm. I also have random Freezes, Manjaro Gnome Wayland Session on Linux Kernel 5.10, also since the last 04/09 update. BTW I also have a AMD Radeon 3. Not Bios issue (it didn’t happens before and I have the latest Bios from my device) May because of AMDVLK update be the cause? Anyway, have a nice day.

1 Like

Hey there! I’m experiencing random freezes very often, and the culprit seems to come from some AMD drivers as well. Rolling them back didn’t fix it for me :frowning: Here is a post I wrote the other day about it, some other users and I have been updating our experiences on it and what did not solve it: System frequently crashing after GPU drivers update - #28 by HoneyBear52

1 Like

I just did that, and I noticed an β€œIOMMU option”, was set to β€˜auto’.
Should I change it?

Did you try downgrading it? I can try it later today, but I’m afraid to make the system more unstable. I hope the Manjaro team reads your post in that upgrade thread.

Guys, I’m using xorg instead of wayland, and it looks like I’ll have no more freezings. 24hrs+ of uptime, when through every possible scenario (hybernate, locked, etc) and everything’s looking good. Also, I use GNOME.

I’ll keep you guy posted

Edit: I’m running the system for 2 days + without problems. Wayland was definitively messing things up.

Edit2: Still getting freezes. I’ll try that kernel thing

1 Like

If you you’re still wondering how to fix this - I got the same error. Using the experimental 5.12 kernel fixed this for me, I’ve described it here (add dots to the URL since Manjaro Forum doesn’t let me post links for whatever reason):

forum manjaro org/t/system-frequently-crashing-after-gpu-drivers-update/62139/51

1 Like

From post above.

Will give that a go. I’m getting freezes as well.

Didn’t stop my system from freezing. I’m thinking it’s something to do with nvidia.

With mine it will freeze then after a few seconds the mouse starts moving again but it will not β€˜click’ on anything. Left or right.

same problem after last big update

  1. CPU: AMD Ryzen 5 3550H
  2. GPU: Radeon Vega Mobile Gfx (8)
  3. DE: gnome3
  4. Kernel: 5.10
  5. Driver: video-linux

kernel bug or video-linux bug ?

journal log:

4月 26 14:51:26 happyxhw kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process gnome-shell pid 39473 thread gnome-shel:cs0 pid 39512)
4月 26 14:51:26 happyxhw kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800100a00000 from client 27
4月 26 14:51:26 happyxhw kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00441051
4月 26 14:51:26 happyxhw kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
4月 26 14:51:26 happyxhw kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
4月 26 14:51:26 happyxhw kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
4月 26 14:51:26 happyxhw kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
4月 26 14:51:26 happyxhw kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
4月 26 14:51:26 happyxhw kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x1
4月 26 14:51:26 happyxhw kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32774, for process gnome-shell pid 39473 thread gnome-shel:cs0 pid 39512)

try to use kernel5.12rc for luck

same with kernel 5.12rc

I am frustrated !

I need help !

I do not want to update manjaro forever !

Are you sure that you’re using kernel 5.12? Because I am, and it stopped the freezing issues. You have to install it through this GUI and then choose it on GRUB (Advanced settings for Manjaro β†’ use kernel 5.12)

Seems it did work. After installing the newer kernel & booting the laptop froze after a few minutes. That’s why I thought it was a driver issue. Since rebooting again it has been running perfectly. :slight_smile: :+1:

Update from my system:

I was running kernel 5.12 for some time now with the latest mesa version installed.
Had another round of the retry page fault error yesterday.

So, for me the problem occurs less frequent but is still there.

2 Likes

Although I might be talking to myself here is another update:

My system is running stable now for 9 days without any freeze.

The changes I did:

remove the following boot parameters:

  • amd_iommu=on
  • iommu=pt

Reason I had set them were severe problems with horizontal glitches/flickering lines six month+ ago.
These problems went away after setting the boot parameters so I left them in until now. As the problems did not appear again after removing, the underlying problem seems to be solved.

I added the following boot parameter:

  • amdgpu.noretry=0

Source: https://www.phoronix.com/scan.php?page=news_item&px=AMDGPU-APU-noretry

In addition I reset BIOS to default settings and explicitly disabled IOMMU afterwards (switch from AUTO to DISABLED).

Not sure if the BIOS reset is required.

Fingers crossed that the system remains stable.

3 Likes

Same problem in Manjaro Gnome and KDE using 5.10 kernel. 5.12 kernel fixed the problem. Using a Ryzen 5 3400g.

the same problem.

CPU: AMD R5 3400G
Kernel: 5.4 & 5.12

log:

6月 03 14:47:27 r-lc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
6月 03 14:47:26 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x0
6月 03 14:47:26 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 03 14:47:26 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
6月 03 14:47:26 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 03 14:47:26 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 03 14:47:26 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
6月 03 14:47:26 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
6月 03 14:47:26 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x800126199000 from client 27
6月 03 14:47:26 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32772, for process chrome pid 15031 thread chrome:cs0 pid 15082)
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset(4) succeeded!
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: [drm] Skip scheduling IBs!
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow done
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: recover vram bo from shadow start
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
6月 04 11:44:02 r-lc kernel: [drm] VCN decode and encode initialized successfully(under SPG Mode).
6月 04 11:44:02 r-lc kernel: [drm] kiq ring mec 2 pipe 1 q 0
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
6月 04 11:44:02 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
6月 04 11:44:01 r-lc kernel: [drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR
6月 04 11:44:01 r-lc kernel: [drm] PSP is resuming...
6月 04 11:44:01 r-lc kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
6月 04 11:44:01 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
6月 04 11:44:01 r-lc kernel: [Hardware Error]: cache level: L3/GEN, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)
6月 04 11:44:01 r-lc kernel: [Hardware Error]: Coherent Slave Ext. Error Code: 1, Address Violation.
6月 04 11:44:01 r-lc kernel: [Hardware Error]: IPID: 0x0000002e00000000, Syndrome: 0x000000005b240203
6月 04 11:44:01 r-lc kernel: [Hardware Error]: Error Addr: 0x00007ffcffffff00
6月 04 11:44:01 r-lc kernel: [Hardware Error]: CPU:0 (17:18:1) MC20_STATUS[-|-|MiscV|AddrV|-|-|SyndV|UECC|Deferred|-|-]: 0x9c2030000001085b
6月 04 11:44:01 r-lc kernel: [Hardware Error]: Deferred error, no action required.
6月 04 11:44:01 r-lc kernel: mce: [Hardware Error]: Machine check events logged
6月 04 11:44:01 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: MODE2 reset
6月 04 11:44:01 r-lc kernel: [drm] free PSP TMR buffer
6月 04 11:44:01 r-lc kernel: [drm] psp command (0x2) failed and response status is (0x117)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11b80c280 flags=0x0070]
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11b80c260 flags=0x0070]
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11b80c240 flags=0x0070]
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11b80c220 flags=0x0070]
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11b80c200 flags=0x0070]
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11b80c1e0 flags=0x0070]
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11b80c1c0 flags=0x0070]
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11b80c1a0 flags=0x0070]
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11b80c180 flags=0x0070]
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11b80c160 flags=0x0070]
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
6月 04 11:44:00 r-lc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 754 thread Xorg:cs0 pid 782
6月 04 11:44:00 r-lc kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=2203937, emitted seq=2203939
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001C0071
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x000080010360a000 from client 27
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 754 thread Xorg:cs0 pid 782)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001C0071
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103608000 from client 27
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 754 thread Xorg:cs0 pid 782)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001C0071
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103600000 from client 27
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 754 thread Xorg:cs0 pid 782)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001C0071
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x000080010361a000 from client 27
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 754 thread Xorg:cs0 pid 782)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001C0071
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103610000 from client 27
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 754 thread Xorg:cs0 pid 782)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001C0071
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103612000 from client 27
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 754 thread Xorg:cs0 pid 782)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001C0071
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103602000 from client 27
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 754 thread Xorg:cs0 pid 782)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001C0071
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103618000 from client 27
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 754 thread Xorg:cs0 pid 782)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001C0071
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x000080010360a000 from client 27
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 754 thread Xorg:cs0 pid 782)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x001C0071
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x0000800103608000 from client 27
6月 04 11:44:00 r-lc kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32769, for process Xorg pid 754 thread Xorg:cs0 pid 782)

Same problem here. kenrel 5.12, gnome with wayland and amdgpu.

Having the same issue. Randomly freezing everything and the screen going dark. Only a hard reset helps.

Jun 10 14:37:28 T495 kernel: amdgpu 0000:06:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:6 pasid:32773, for process firefox pid 54437 thread firefox:cs0 pid 54491)
Jun 10 14:37:28 T495 kernel: amdgpu 0000:06:00.0: amdgpu:   in page starting at address     0x0000800109201000 from client 27
Jun 10 14:37:28 T495 kernel: amdgpu 0000:06:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00641051
Jun 10 14:37:28 T495 kernel: amdgpu 0000:06:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 10 14:37:28 T495 kernel: amdgpu 0000:06:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 10 14:37:28 T495 kernel: amdgpu 0000:06:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 10 14:37:28 T495 kernel: amdgpu 0000:06:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jun 10 14:37:28 T495 kernel: amdgpu 0000:06:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 10 14:37:28 T495 kernel: amdgpu 0000:06:00.0: amdgpu:          RW: 0x1

There is also a post in Arch forums.

No solution.