Kernel 6.0.2 and AMD-GPU/Notion snap issue

Hi,

I recently installed the newly released linux kernel 6.0.2 on my Manjaro laptop. After using my usual apps for a while, I notice the laptop “freezed” and only a forced power off got it back. I’m sending the logs that were issued exactly before my laptop stopped working.

Seems my notion snap app is causing the problem. But it was working smoothly on the previous kernel (LTS 5.15.74-3). Does anyone faced this problem with the new kernel?

Systemd log output

Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: [mmhub0] retry page fault (src_id:0 ring:0 vmid:1 pasid:32778, for process notion-snap pid 5993 thread no>
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800108620000 from IH client 0x12 (VMC)
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00140051
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: MP1 (0x0)
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Oct 17 12:53:08 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x1
Oct 17 12:53:18 manjaro-bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=10149, emitted seq=10151
Oct 17 12:53:18 manjaro-bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process notion-snap pid 5993 thread notion-sna:cs0 pid 6023
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1232258c0 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1232258e0 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123225900 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123225920 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123225960 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123225940 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123225980 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x1232259a0 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x123240000 flags=0x0070]
Oct 17 12:53:18 manjaro-bl kernel: [drm] free PSP TMR buffer
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: MODE2 reset
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
Oct 17 12:53:18 manjaro-bl kernel: [drm] PCIE GART of 1024M enabled.
Oct 17 12:53:18 manjaro-bl kernel: [drm] PTB located at 0x000000F400A00000
Oct 17 12:53:18 manjaro-bl kernel: [drm] PSP is resuming...
Oct 17 12:53:18 manjaro-bl kernel: [drm] reserve 0x400000 from 0xf439000000 for PSP TMR
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
Oct 17 12:53:18 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
Oct 17 12:53:20 manjaro-bl kernel: [drm] kiq ring mec 2 pipe 1 q 0
Oct 17 12:53:20 manjaro-bl kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring sdma0 test failed (-110)
Oct 17 12:53:20 manjaro-bl kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <sdma_v4_0> failed -110
Oct 17 12:53:20 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1) failed
Oct 17 12:53:20 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -110
Oct 17 12:53:20 manjaro-bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Oct 17 12:53:30 manjaro-bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=10151, emitted seq=10151
Oct 17 12:53:30 manjaro-bl kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process notion-snap pid 5993 thread notion-sna:cs0 pid 6023
Oct 17 12:53:30 manjaro-bl kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Oct 17 12:53:30 manjaro-bl kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028
Oct 17 12:53:30 manjaro-bl kernel: #PF: supervisor read access in kernel mode
Oct 17 12:53:30 manjaro-bl kernel: #PF: error_code(0x0000) - not-present page
Oct 17 12:53:30 manjaro-bl kernel: PGD 0 P4D 0 
Oct 17 12:53:30 manjaro-bl kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Oct 17 12:53:30 manjaro-bl kernel: CPU: 3 PID: 84 Comm: irq/25-AMD-Vi Tainted: G           OE      6.0.2-2-MANJARO #1 3201429db279b991ed623dff5d7b05e56f9c1c48
Oct 17 12:53:30 manjaro-bl kernel: Hardware name: LENOVO 81V7/LNVNB161216, BIOS BUCN33WW 05/12/2022
Oct 17 12:53:30 manjaro-bl kernel: RIP: 0010:report_iommu_fault+0x15/0x90
Oct 17 12:53:30 manjaro-bl kernel: Code: ff ff ff 5b 5d 41 5c e9 0d d8 88 00 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 0f 1f 44 00 00 41 55 41 54 41 89 cc 55 48 89 d5 53 <4>
Oct 17 12:53:30 manjaro-bl kernel: RSP: 0018:ffffad20c0493df8 EFLAGS: 00010202
Oct 17 12:53:30 manjaro-bl kernel: RAX: 0000000000000000 RBX: 00000000000000b0 RCX: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: RDX: 000000012320a5c0 RSI: ffff8b6e416df0d0 RDI: 0000000000000010
Oct 17 12:53:30 manjaro-bl kernel: RBP: 000000012320a5c0 R08: ffff8b6e41737280 R09: 0000000000000070
Oct 17 12:53:30 manjaro-bl kernel: R10: ffff8b6e402160b0 R11: 0000000000000003 R12: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: R13: ffff8b6e40062800 R14: 0000000000000300 R15: 0000000000000003
Oct 17 12:53:30 manjaro-bl kernel: FS:  0000000000000000(0000) GS:ffff8b6ff4ac0000(0000) knlGS:0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000028 CR3: 00000000900b2000 CR4: 00000000003506e0
Oct 17 12:53:30 manjaro-bl kernel: Call Trace:
Oct 17 12:53:30 manjaro-bl kernel:  <TASK>
Oct 17 12:53:30 manjaro-bl kernel:  amd_iommu_int_thread+0x61e/0x780
Oct 17 12:53:30 manjaro-bl kernel:  ? __wake_up_common_lock+0x88/0xc0
Oct 17 12:53:30 manjaro-bl kernel:  ? disable_irq_nosync+0x10/0x10
Oct 17 12:53:30 manjaro-bl kernel:  irq_thread_fn+0x23/0x60
Oct 17 12:53:30 manjaro-bl kernel:  irq_thread+0xfe/0x1c0
Oct 17 12:53:30 manjaro-bl kernel:  ? irq_thread_fn+0x60/0x60
Oct 17 12:53:30 manjaro-bl kernel:  ? irq_thread_check_affinity+0xd0/0xd0
Oct 17 12:53:30 manjaro-bl kernel:  kthread+0xde/0x110
Oct 17 12:53:30 manjaro-bl kernel:  ? kthread_complete_and_exit+0x20/0x20
Oct 17 12:53:30 manjaro-bl kernel:  ret_from_fork+0x22/0x30
Oct 17 12:53:30 manjaro-bl kernel:  </TASK>
Oct 17 12:53:30 manjaro-bl kernel: Modules linked in: tun rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg bnep qrtr btu>
Oct 17 12:53:30 manjaro-bl kernel:  acpi_cpufreq squashfs loop uinput vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) crypto_user fuse bpf_preload ip_tables x_tables usbhid bt>
Oct 17 12:53:30 manjaro-bl kernel: Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 fjes():1 amd64_edac():1 fjes():1 amd64_edac():1 >
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000028
Oct 17 12:53:30 manjaro-bl kernel: ---[ end trace 0000000000000000 ]---
Oct 17 12:53:30 manjaro-bl kernel: RIP: 0010:report_iommu_fault+0x15/0x90
Oct 17 12:53:30 manjaro-bl kernel: Code: ff ff ff 5b 5d 41 5c e9 0d d8 88 00 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 0f 1f 44 00 00 41 55 41 54 41 89 cc 55 48 89 d5 53 <4>
Oct 17 12:53:30 manjaro-bl kernel: RSP: 0018:ffffad20c0493df8 EFLAGS: 00010202
Oct 17 12:53:30 manjaro-bl kernel: RAX: 0000000000000000 RBX: 00000000000000b0 RCX: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: RDX: 000000012320a5c0 RSI: ffff8b6e416df0d0 RDI: 0000000000000010
Oct 17 12:53:30 manjaro-bl kernel: RBP: 000000012320a5c0 R08: ffff8b6e41737280 R09: 0000000000000070
Oct 17 12:53:30 manjaro-bl kernel: R10: ffff8b6e402160b0 R11: 0000000000000003 R12: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: R13: ffff8b6e40062800 R14: 0000000000000300 R15: 0000000000000003
Oct 17 12:53:30 manjaro-bl kernel: FS:  0000000000000000(0000) GS:ffff8b6ff4ac0000(0000) knlGS:0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000028 CR3: 00000000900b2000 CR4: 00000000003506e0
Oct 17 12:53:30 manjaro-bl kernel: BUG: kernel NULL pointer dereference, address: 0000000000000a91
Oct 17 12:53:30 manjaro-bl kernel: #PF: supervisor write access in kernel mode
Oct 17 12:53:30 manjaro-bl kernel: #PF: error_code(0x0002) - not-present page
Oct 17 12:53:30 manjaro-bl kernel: PGD 0 P4D 0 
Oct 17 12:53:30 manjaro-bl kernel: Oops: 0002 [#2] PREEMPT SMP NOPTI
Oct 17 12:53:30 manjaro-bl kernel: CPU: 3 PID: 84 Comm: irq/25-AMD-Vi Tainted: G      D    OE      6.0.2-2-MANJARO #1 3201429db279b991ed623dff5d7b05e56f9c1c48
Oct 17 12:53:30 manjaro-bl kernel: Hardware name: LENOVO 81V7/LNVNB161216, BIOS BUCN33WW 05/12/2022
Oct 17 12:53:30 manjaro-bl kernel: RIP: 0010:mutex_lock+0x1d/0x30
Oct 17 12:53:30 manjaro-bl kernel: Code: 00 00 be 02 00 00 00 e9 51 f8 ff ff 90 f3 0f 1e fa 0f 1f 44 00 00 53 48 89 fb 2e 2e 2e 31 c0 31 c0 65 48 8b 14 25 c0 0b 02 00 <f>
Oct 17 12:53:30 manjaro-bl kernel: RSP: 0018:ffffad20c0493e58 EFLAGS: 00010246
Oct 17 12:53:30 manjaro-bl kernel: RAX: 0000000000000000 RBX: 0000000000000a91 RCX: 00000000000001b0
Oct 17 12:53:30 manjaro-bl kernel: RDX: ffff8b6e411b0000 RSI: 0000000000001cc9 RDI: 0000000000000a91
Oct 17 12:53:30 manjaro-bl kernel: RBP: ffff8b6e411b0000 R08: 0000000000000000 R09: ffffad20c0493aa8
Oct 17 12:53:30 manjaro-bl kernel: R10: 0000000000000003 R11: ffffffff88acb508 R12: 0000000000000009
Oct 17 12:53:30 manjaro-bl kernel: R13: 0000000000000001 R14: 0000000000000a91 R15: 0000000000000ab1
Oct 17 12:53:30 manjaro-bl kernel: FS:  0000000000000000(0000) GS:ffff8b6ff4ac0000(0000) knlGS:0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000a91 CR3: 00000000900b2000 CR4: 00000000003506e0
Oct 17 12:53:30 manjaro-bl kernel: Call Trace:
Oct 17 12:53:30 manjaro-bl kernel:  <TASK>
Oct 17 12:53:30 manjaro-bl kernel:  perf_event_exit_task+0x41/0x2b0
Oct 17 12:53:30 manjaro-bl kernel:  do_exit+0x342/0xad0
Oct 17 12:53:30 manjaro-bl kernel:  ? task_work_run+0x60/0x90
Oct 17 12:53:30 manjaro-bl kernel:  ? do_exit+0x332/0xad0
Oct 17 12:53:30 manjaro-bl kernel:  ? make_task_dead+0x55/0x60
Oct 17 12:53:30 manjaro-bl kernel:  ? rewind_stack_and_make_dead+0x17/0x20
Oct 17 12:53:30 manjaro-bl kernel:  </TASK>
Oct 17 12:53:30 manjaro-bl kernel: Modules linked in: tun rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg bnep qrtr btu>
Oct 17 12:53:30 manjaro-bl kernel:  acpi_cpufreq squashfs loop uinput vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) crypto_user fuse bpf_preload ip_tables x_tables usbhid bt>
Oct 17 12:53:30 manjaro-bl kernel: Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 fjes():1 amd64_edac():1 fjes():1 amd64_edac():1 >
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000a91
Oct 17 12:53:30 manjaro-bl kernel: ---[ end trace 0000000000000000 ]---
Oct 17 12:53:30 manjaro-bl kernel: RIP: 0010:report_iommu_fault+0x15/0x90
Oct 17 12:53:30 manjaro-bl kernel: Code: ff ff ff 5b 5d 41 5c e9 0d d8 88 00 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 0f 1f 44 00 00 41 55 41 54 41 89 cc 55 48 89 d5 53 <4>
Oct 17 12:53:30 manjaro-bl kernel: RSP: 0018:ffffad20c0493df8 EFLAGS: 00010202
Oct 17 12:53:30 manjaro-bl kernel: RAX: 0000000000000000 RBX: 00000000000000b0 RCX: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: RDX: 000000012320a5c0 RSI: ffff8b6e416df0d0 RDI: 0000000000000010
Oct 17 12:53:30 manjaro-bl kernel: RBP: 000000012320a5c0 R08: ffff8b6e41737280 R09: 0000000000000070
Oct 17 12:53:30 manjaro-bl kernel: R10: ffff8b6e402160b0 R11: 0000000000000003 R12: 0000000000000001
Oct 17 12:53:30 manjaro-bl kernel: R13: ffff8b6e40062800 R14: 0000000000000300 R15: 0000000000000003
Oct 17 12:53:30 manjaro-bl kernel: FS:  0000000000000000(0000) GS:ffff8b6ff4ac0000(0000) knlGS:0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Oct 17 12:53:30 manjaro-bl kernel: CR2: 0000000000000a91 CR3: 00000000900b2000 CR4: 00000000003506e0
Oct 17 12:53:30 manjaro-bl kernel: Fixing recursive fault but reboot is needed!
Oct 17 12:53:30 manjaro-bl kernel: BUG: scheduling while atomic: irq/25-AMD-Vi/84/0x00000000
Oct 17 12:53:30 manjaro-bl kernel: Modules linked in: tun rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device ccm cmac algif_hash algif_skcipher af_alg bnep qrtr btu>
Oct 17 12:53:30 manjaro-bl kernel:  acpi_cpufreq squashfs loop uinput vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) crypto_user fuse bpf_preload ip_tables x_tables usbhid bt>
Oct 17 12:53:30 manjaro-bl kernel: Unloaded tainted modules: amd64_edac():1 amd64_edac():1 amd64_edac():1 amd64_edac():1 fjes():1 amd64_edac():1 fjes():1 amd64_edac():1 >
Oct 17 12:53:30 manjaro-bl kernel: CPU: 3 PID: 84 Comm: irq/25-AMD-Vi Tainted: G      D    OE      6.0.2-2-MANJARO #1 3201429db279b991ed623dff5d7b05e56f9c1c48
Oct 17 12:53:30 manjaro-bl kernel: Hardware name: LENOVO 81V7/LNVNB161216, BIOS BUCN33WW 05/12/2022
Oct 17 12:53:30 manjaro-bl kernel: Call Trace:
Oct 17 12:53:30 manjaro-bl kernel:  <TASK>
Oct 17 12:53:30 manjaro-bl kernel:  dump_stack_lvl+0x48/0x60
Oct 17 12:53:30 manjaro-bl kernel:  __schedule_bug.cold+0x4b/0x57
Oct 17 12:53:30 manjaro-bl kernel:  __schedule+0xde8/0x11c0
Oct 17 12:53:30 manjaro-bl kernel:  do_task_dead+0x43/0x50
Oct 17 12:53:30 manjaro-bl kernel:  make_task_dead.cold+0x51/0xab
Oct 17 12:53:30 manjaro-bl kernel:  rewind_stack_and_make_dead+0x17/0x20
Oct 17 12:53:30 manjaro-bl kernel: RIP: 0000:0x0
Oct 17 12:53:30 manjaro-bl kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
Oct 17 12:53:30 manjaro-bl kernel: RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Oct 17 12:53:30 manjaro-bl kernel:  </TASK>
Oct 17 12:53:30 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:30 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:30 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:30 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:30 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:31 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:31 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out
Oct 17 12:53:31 manjaro-bl kernel: AMD-Vi: Completion-Wait loop timed out

The log message is too general, we do not know exactly what the cause is.

You can check Issues · drm / amd · GitLab if the similar issue exists.

I heard there are many reports about snapd issues e.g. Search results for 'snapd' - Manjaro Linux Forum

I would recommend to uninstall snapd. AUR is fine.

You can remain LTS.

Kernels are for hardware support. Newer kernels support the newest hardware. Unless you need support for a specific piece of new hardware I recommend you stick to the latest LTS kernel. Because while newer kernels support newer hardware they sometimes introduce new bugs.

So many thanks for your input @Zesko. I’ll try to dig into and see if I can find something related.

Thank You @Jim.B. I had to roll back to the LTS kernel. It’s safe and has been running flawlessly until now.