Xserver/GPU crashes

Ok, I will return to dells website and dig a little deeper. Its been 20 minutes and no crashes yet

:crossed_fingers:

will I be able to do the same thing with other updates?

https://www.dell.com/support/kbdoc/en-us/000131486/update-the-dell-bios-in-a-linux-or-ubuntu-environment

Only the bios from my understanding. I’m not sure there is a need to update firmware on anything else.

So I do not need to worry about this?

https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=tr2dy&oscode=wt64a&productcode=inspiron-17-3785-laptop

For the section titled " Updating the BIOS on supported UEFI systems (2015 onwards)", I was able to just drop the file onto a FAT formatted thumb drive without doing all the other stuff.

Shouldn’t need to worry about that. According to the security advisor it only applies to:

“Dell Client platforms licensed for Microsoft Windows 10 restored using a Dell OS recovery image for Microsoft Windows 10 that was downloaded before December 20, 2019.”

and it was remediated on later recovery images anyway.

I understand. The BIOS update was released in 2019 version 1.4.0. I used dmidecode -s bios-version and found out mine also is 1.4.0

Yeah, it looks like they haven’t done much in the way of BIOS updates for that machine.

Been up for 18 hours without a crash I think we may have fixed it! I think I will wait untill wednesday to mark this as solved.

Great to hear! For the record, I may have wrongly assumed you wouldn’t have any other firmware to update; look into fwupd.

fwupd says there are no updates.

Hey it failed twice in a span of 5 minutes.

journalctl --system --boot=-2 --priority=3 | tail --lines=35 log

Jul 11 10:15:01 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 11 10:15:11 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 11 10:15:21 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 11 10:16:02 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 11 13:06:36 inspiron3785 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Jul 11 13:06:36 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=14215028, emitted seq=14215030
Jul 11 13:06:36 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process glxinfo pid 3111155 thread glxinfo:cs0 pid 3111163
Jul 11 13:06:36 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11c340000 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11c3347c0 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11c340000 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11c3347e0 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11c340000 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11c334800 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11c340000 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11c334820 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11c340000 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x11c334840 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0000 address=0x11c340000 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0000 address=0x11c334860 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0000 address=0x11c340000 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0000 address=0x11c334880 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0000 address=0x11c340000 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0000 address=0x11c3348a0 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0000 address=0x11c340000 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0000 address=0x11c3348c0 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0000 address=0x11c340000 flags=0x0070]
Jul 11 13:06:36 inspiron3785 kernel: AMD-Vi: Event logged [IO_PAGE_FAULT device=03:00.0 domain=0x0000 address=0x11c3348e0 flags=0x0070]
Jul 11 13:06:38 inspiron3785 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ib_ring_tests [amdgpu]] *ERROR* IB test failed on gfx (-110).
Jul 11 13:06:38 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: ib ring test failed (-110).
Jul 11 13:06:39 inspiron3785 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
Jul 11 13:06:39 inspiron3785 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
Jul 11 13:06:49 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 11 13:06:59 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 11 13:07:17 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=4801672, emitted seq=4801674
Jul 11 13:07:17 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0

I have turned of all display powermanagement and am going to see if that works

My power manager is xfce and settings were:

  • laptop lid closed: switch of display
  • When inactive for 15 minutes Suspend on battery
  • Display power management: ON
  • Put to sleep after 5 minutes

Now they are all off

Odd. Out of curiosity, are you running a swap partition or file?

yes, as a matter of fact I am running a swap partition. It is 15 gigs.

Try adding amdgpu.noretry=0 to your boot parameters. After that I might be out of ideas. I thing the big issue is trying to hone in on when the crash is happening and under what circumstances. Some of the stuff I read suggested adding amdgpu.dpm=0 as well. Has this been an ongoing issue since your initial install, or did the issue happen after a system update? Someone else suggested downgrading linux-firmware, but I think that would only be relevant if the issue started recently. You could also boot to a live image and see if the issue happens there.

This has started recently.

I might try downgrading the linux-firmware package than and see if that works as a last resort.

1 Like

Hey just caught my gpu lagging just wanted to show the log:

Jul 11 18:41:44 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 11 18:41:44 inspiron3785 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: IH ring buffer overflow (0x000A3D00, 0x00013D20, 0x00023D20)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x800103c64000 from client 27
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32770, for process picom pid 1345 th
read picom:cs0 pid 1379)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x800103c60000 from client 27
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32770, for process picom pid 1345 th
read picom:cs0 pid 1379)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x800103c61000 from client 27
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32770, for process picom pid 1345 th
read picom:cs0 pid 1379)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x800103c64000 from client 27
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32770, for process picom pid 1345 th
read picom:cs0 pid 1379)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x800103c65000 from client 27
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32770, for process picom pid 1345 th
read picom:cs0 pid 1379)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x800103c60000 from client 27
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32770, for process picom pid 1345 th
read picom:cs0 pid 1379)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x800103c65000 from client 27
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32770, for process picom pid 1345 th
read picom:cs0 pid 1379)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x800103c64000 from client 27
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32770, for process picom pid 1345 th
read picom:cs0 pid 1379)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x800103c61000 from client 27
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32770, for process picom pid 1345 th
read picom:cs0 pid 1379)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00501031
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x800103c60000 from client 27
Jul 11 18:41:34 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32770, for process picom pid 1345 th
read picom:cs0 pid 1379)

I have downgraded so fingers crossed it will work

:crossed_fingers:

1 Like