Xserver/GPU crashes

Ok I will try that

Unfortunatly it still crashes

I tried Kernel 5.10

Would wiping my system and reinstalling fix anything?

Probably not.

Next time this happens, please:

  • REISUB

  • give the output to:

    inxi --admin --verbosity=7 --filter --no-host --width
    journalctl --system --boot=-1 --priority=1 | tail --lines=35
    smartctl --all /dev/XdY
    

    where X and Y denominate your boot disk.

:crossed_fingers:

What does REISUB mean?

:crossed_fingers:

Crashed again :frowning:
OUTPUT:

System:
  Kernel: 5.10.42-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0 
  parameters: BOOT_IMAGE=/boot/vmlinuz-5.10-x86_64 
  root=UUID=e978d5bb-af99-49bd-9fb9-5e16e96ebdad rw apparmor=1 
  security=apparmor resume=UUID=c004ee4b-ffd3-4764-94cc-d5c34b95aebf 
  udev.log_priority=3 sysrq_always_enabled=1 
  Desktop: i3 4.19.1 info: i3bar vt: 7 dm: LightDM 1.30.0 
  Distro: Manjaro Linux base: Arch Linux 
Machine:
  Type: Laptop System: Dell product: Inspiron 3785 v: 1.4.0 serial: <filter> 
  Chassis: type: 10 v: 1.4.0 serial: <filter> 
  Mobo: Dell model: 0VY1RG v: X01 serial: <filter> UEFI: Dell v: 1.4.0 
  date: 05/29/2019 
Battery:
  ID-1: BAT1 charge: 33.7 Wh (100.0%) condition: 33.7/42.0 Wh (80.2%) 
  volts: 12.7 min: 11.4 model: Simplo 0x32,0x39,0x37,0x55,0x00,0x00,0x0006 
  type: Li-ion serial: <filter> status: Full 
Memory:
  RAM: total: 13.62 GiB used: 2.21 GiB (16.2%) 
  Array-1: capacity: 128 GiB note: check slots: 2 EC: None 
  max-module-size: 64 GiB note: est. 
  Device-1: DIMM A size: 8 GiB speed: spec: 2667 MT/s actual: 2400 MT/s 
  type: DDR4 detail: synchronous unbuffered (unregistered) bus-width: 64 bits 
  total: 64 bits manufacturer: 859B0000802C part-no: CT8G4SFS8266.C8FD1 
  serial: <filter> 
  Device-2: DIMM B size: 8 GiB speed: spec: 2667 MT/s actual: 2400 MT/s 
  type: DDR4 detail: synchronous unbuffered (unregistered) bus-width: 64 bits 
  total: 64 bits manufacturer: 859B0000802C part-no: CT8G4SFS8266.C8FD1 
  serial: <filter> 
CPU:
  Info: Quad Core model: AMD Ryzen 5 2500U with Radeon Vega Mobile Gfx 
  socket: FP5 bits: 64 type: MT MCP arch: Zen family: 17 (23) 
  model-id: 11 (17) stepping: 0 microcode: 810100B cache: L1: 384 KiB 
  L2: 2 MiB L3: 4 MiB bogomips: 31954 
  Speed: 1368 MHz min/max: 1600/2000 MHz base/boost: 2000/2000 boost: enabled 
  volts: 1.2 V ext-clock: 100 MHz Core speeds (MHz): 1: 1368 2: 1368 3: 1369 
  4: 1351 5: 1372 6: 1371 7: 1522 8: 1394 
  Flags: 3dnowprefetch abm adx aes aperfmperf apic arat avic avx avx2 bmi1 
  bmi2 bpext clflush clflushopt clzero cmov cmp_legacy constant_tsc cpb cpuid 
  cr8_legacy cx16 cx8 de decodeassists extapic extd_apicid f16c flushbyasid 
  fma fpu fsgsbase fxsr fxsr_opt ht hw_pstate ibpb irperf lahf_lm lbrv lm mca 
  mce misalignsse mmx mmxext monitor movbe msr mtrr mwaitx nonstop_tsc nopl 
  npt nrip_save nx osvw overflow_recov pae pat pausefilter pclmulqdq pdpe1gb 
  perfctr_core perfctr_llc perfctr_nb pfthreshold pge pni popcnt pse pse36 
  rdrand rdseed rdtscp rep_good sep sev sev_es sha_ni skinit smap smca sme 
  smep ssbd sse sse2 sse4_1 sse4_2 sse4a ssse3 succor svm svm_lock syscall tce 
  topoext tsc tsc_scale v_vmsave_vmload vgif vmcb_clean vme vmmcall wdt 
  xgetbv1 xsave xsavec xsaveerptr xsaveopt xsaves 
  Vulnerabilities: Type: itlb_multihit status: Not affected 
  Type: l1tf status: Not affected 
  Type: mds status: Not affected 
  Type: meltdown status: Not affected 
  Type: spec_store_bypass 
  mitigation: Speculative Store Bypass disabled via prctl and seccomp 
  Type: spectre_v1 
  mitigation: usercopy/swapgs barriers and __user pointer sanitization 
  Type: spectre_v2 mitigation: Full AMD retpoline, IBPB: conditional, STIBP: 
  disabled, RSB filling 
  Type: srbds status: Not affected 
  Type: tsx_async_abort status: Not affected 
Graphics:
  Device-1: AMD Raven Ridge [Radeon Vega Series / Radeon Vega Mobile Series] 
  vendor: Dell driver: amdgpu v: kernel bus-ID: 03:00.0 chip-ID: 1002:15dd 
  class-ID: 0300 
  Device-2: Microdia Integrated_Webcam_HD type: USB driver: uvcvideo 
  bus-ID: 1-4:2 chip-ID: 0c45:671e class-ID: 0e02 
  Display: x11 server: X.Org 1.20.11 compositor: picom v: git-dac85 driver: 
  loaded: amdgpu,ati unloaded: modesetting alternate: fbdev,vesa 
  display-ID: :0 screens: 1 
  Screen-1: 0 s-res: 1600x900 s-dpi: 96 s-size: 423x238mm (16.7x9.4") 
  s-diag: 485mm (19.1") 
  Monitor-1: eDP res: 1600x900 hz: 60 dpi: 106 size: 382x214mm (15.0x8.4") 
  diag: 438mm (17.2") 
  OpenGL: renderer: AMD Radeon Vega 8 Graphics (RAVEN DRM 3.40.0 
  5.10.42-1-MANJARO LLVM 12.0.0) 
  v: 4.6 Mesa 21.1.2 direct render: Yes 
Audio:
  Device-1: AMD Raven/Raven2/Fenghuang HDMI/DP Audio vendor: Dell 
  driver: snd_hda_intel v: kernel bus-ID: 03:00.1 chip-ID: 1002:15de 
  class-ID: 0403 
  Device-2: AMD Family 17h HD Audio vendor: Dell driver: snd_hda_intel 
  v: kernel bus-ID: 03:00.6 chip-ID: 1022:15e3 class-ID: 0403 
  Sound Server-1: ALSA v: k5.10.42-1-MANJARO running: yes 
  Sound Server-2: JACK v: 0.125.0 running: no 
  Sound Server-3: PulseAudio v: 14.2 running: yes 
  Sound Server-4: PipeWire v: 0.3.30 running: no 
Network:
  Device-1: Realtek RTL810xE PCI Express Fast Ethernet vendor: Dell 
  driver: r8169 v: kernel port: 2000 bus-ID: 01:00.0 chip-ID: 10ec:8136 
  class-ID: 0200 
  IF: enp1s0 state: up speed: 100 Mbps duplex: full mac: <filter> 
  IP v4: <filter> type: dynamic noprefixroute scope: global 
  broadcast: <filter> 
  IP v6: <filter> type: noprefixroute scope: link 
  Device-2: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter 
  vendor: Dell driver: ath10k_pci v: kernel port: 2000 bus-ID: 02:00.0 
  chip-ID: 168c:0042 class-ID: 0280 
  IF: wlp2s0 state: down mac: <filter> 
  WAN IP: <filter> 
Bluetooth:
  Device-1: Qualcomm Atheros type: USB driver: btusb v: 0.8 bus-ID: 3-2.4:5 
  chip-ID: 0cf3:e009 class-ID: e001 
  Report: rfkill ID: hci0 rfk-id: 1 state: up address: see --recommends 
Logical:
  Message: No logical block device data found. 
RAID:
  Message: No RAID data found. 
Drives:
  Local Storage: total: 931.51 GiB used: 78.61 GiB (8.4%) 
  ID-1: /dev/sda maj-min: 8:0 vendor: Crucial model: CT1000BX500SSD1 
  family: Micron Client SSDs size: 931.51 GiB block-size: physical: 512 B 
  logical: 512 B sata: 3.3 speed: 6.0 Gb/s rotation: SSD serial: <filter> 
  rev: 030 temp: 24 C scheme: GPT 
  SMART: yes state: enabled health: PASSED on: 152d 19h cycles: 783 
  written: 4.95 TiB 
  Optical-1: /dev/sr0 vendor: PLDS model: DVD+-RW DU-8A5LH rev: 6D1M 
  dev-links: cdrom 
  Features: speed: 24 multisession: yes audio: yes dvd: yes 
  rw: cd-r,cd-rw,dvd-r state: running 
Partition:
  ID-1: / raw-size: 916.23 GiB size: 900.78 GiB (98.31%) 
  used: 78.61 GiB (8.7%) fs: ext4 block-size: 4096 B dev: /dev/sda2 
  maj-min: 8:2 label: N/A uuid: e978d5bb-af99-49bd-9fb9-5e16e96ebdad 
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%) 
  used: 296 KiB (0.1%) fs: vfat block-size: 512 B dev: /dev/sda1 maj-min: 8:1 
  label: NO_LABEL uuid: 1C05-6E76 
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default) 
  ID-1: swap-1 type: partition size: 14.99 GiB used: 0 KiB (0.0%) priority: -2 
  dev: /dev/sda3 maj-min: 8:3 label: N/A 
  uuid: c004ee4b-ffd3-4764-94cc-d5c34b95aebf 
Unmounted:
  Message: No unmounted partitions found. 
USB:
  Hub-1: 1-0:1 info: Full speed (or root) Hub ports: 4 rev: 2.0 
  speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900 
  Device-1: 1-4:2 info: Microdia Integrated_Webcam_HD type: Video 
  driver: uvcvideo interfaces: 2 rev: 2.0 speed: 480 Mb/s power: 500mA 
  chip-ID: 0c45:671e class-ID: 0e02 
  Hub-2: 2-0:1 info: Full speed (or root) Hub ports: 4 rev: 3.1 speed: 10 Gb/s 
  chip-ID: 1d6b:0003 class-ID: 0900 
  Hub-3: 3-0:1 info: Full speed (or root) Hub ports: 2 rev: 2.0 
  speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900 
  Device-1: 3-1:2 info: Logitech G502 SE HERO Gaming Mouse type: Mouse,HID 
  driver: hid-generic,usbhid interfaces: 2 rev: 2.0 speed: 12 Mb/s 
  power: 300mA chip-ID: 046d:c08b class-ID: 0300 serial: <filter> 
  Hub-4: 3-2:3 info: Terminus Hub ports: 4 rev: 2.0 speed: 480 Mb/s 
  power: 100mA chip-ID: 1a40:0101 class-ID: 0900 
  Device-1: 3-2.1:4 info: Realtek RTS5129 Card Reader Controller 
  type: <vendor specific> driver: rtsx_usb,rtsx_usb_ms,rtsx_usb_sdmmc 
  interfaces: 1 rev: 2.0 speed: 480 Mb/s power: 500mA chip-ID: 0bda:0129 
  class-ID: ff00 serial: <filter> 
  Device-2: 3-2.4:5 info: Qualcomm Atheros type: Bluetooth driver: btusb 
  interfaces: 2 rev: 2.0 speed: 12 Mb/s power: 100mA chip-ID: 0cf3:e009 
  class-ID: e001 
  Hub-5: 4-0:1 info: Full speed (or root) Hub ports: 1 rev: 3.1 speed: 10 Gb/s 
  chip-ID: 1d6b:0003 class-ID: 0900 
Sensors:
  System Temperatures: cpu: 74.2 C mobo: 0 C gpu: amdgpu temp: 74.0 C 
  Fan Speeds (RPM): fan-1: 0 
Info:
  Processes: 300 Uptime: 2m wakeups: 1 Init: systemd v: 248 tool: systemctl 
  Compilers: gcc: 11.1.0 Packages: 1367 pacman: 1340 lib: 332 flatpak: 10 
  snap: 17 Shell: Bash (su) v: 5.1.8 running-in: xfce4-terminal inxi: 3.3.04 
-- Journal begins at Sun 2021-06-06 13:24:49 CDT, ends at Sat 2021-07-10 00:46:30 CDT. --
-- No entries --
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.42-1-MANJARO] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Crucial/Micron Client SSDs
Device Model:     CT1000BX500SSD1
Serial Number:    2012E295DEAD
LU WWN Device Id: 5 00a075 1e295dead
Firmware Version: M6CR030
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.3, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jul 10 00:46:31 2021 CDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(  120) seconds.
Offline data collection
capabilities: 			 (0x11) SMART execute Offline immediate.
					No Auto Offline data collection support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					No Selective Self-test supported.
SMART capabilities:            (0x0002)	Does not save SMART data before
					entering power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  10) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   100   100   000    Pre-fail  Always       -       0
  5 Reallocate_NAND_Blk_Cnt 0x0032   100   100   010    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       3667
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       783
171 Program_Fail_Count      0x0032   100   100   000    Old_age   Always       -       0
172 Erase_Fail_Count        0x0032   100   100   000    Old_age   Always       -       0
173 Ave_Block-Erase_Count   0x0032   099   099   000    Old_age   Always       -       26
174 Unexpect_Power_Loss_Ct  0x0032   100   100   000    Old_age   Always       -       709
180 Unused_Reserve_NAND_Blk 0x0033   100   100   000    Pre-fail  Always       -       9
183 SATA_Interfac_Downshift 0x0032   100   100   000    Old_age   Always       -       0
184 Error_Correction_Count  0x0032   100   100   000    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
194 Temperature_Celsius     0x0022   076   051   000    Old_age   Always       -       24 (Min/Max 11/49)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_ECC_Cnt 0x0032   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0030   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       -       0
202 Percent_Lifetime_Remain 0x0030   099   099   001    Old_age   Offline      -       1
206 Write_Error_Rate        0x000e   100   100   000    Old_age   Always       -       0
210 Success_RAIN_Recov_Cnt  0x0032   100   100   000    Old_age   Always       -       0
246 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       -       10627963360
247 Host_Program_Page_Count 0x0032   100   100   000    Old_age   Always       -       332123855
248 FTL_Program_Page_Count  0x0032   100   100   000    Old_age   Always       -       367783136
249 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       0
250 Read_Error_Retry_Rate   0x0032   100   100   000    Old_age   Always       -       96218
251 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       0
252 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       0
253 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       12
254 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       814
223 Unkn_CrucialMicron_Attr 0x0032   100   100   000    Old_age   Always       -       5

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      2677         -
# 2  Extended offline    Completed without error       00%      2677         -
# 3  Extended offline    Completed without error       00%        14         -
# 4  Extended offline    Self-test routine in progress 90%        14         -
# 5  Short offline       Completed without error       00%        14         -
# 6  Short offline       Self-test routine in progress 70%        14         -
# 7  Short offline       Aborted by host               90%        14         -
# 8  Short offline       Aborted by host               90%        14         -
# 9  Short offline       Aborted by host               90%        11         -
#10  Short offline       Completed without error       00%        11         -
#11  Short offline       Self-test routine in progress 40%        11         -
#12  Short offline       Aborted by host               90%        11         -

Selective Self-tests/Logging not supported
  • Have you tried 5.4 already?

  • Have you looked for newer Dell Firmware yet?

  • What’s the output of:

     journalctl --system --boot=-1 --priority=3 | tail --lines=35
    

    (Higher priority; more info)

:thinking:

Any updates from dell requires Windows :angry:
I will try 5.4
OUTPUT:

Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x1
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32769, for process Xorg pid 822 thread Xorg:cs0 pid 1001)
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000080010127b000 from client 27
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x005C0071
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x1
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:5 pasid:32769, for process Xorg pid 822 thread Xorg:cs0 pid 1001)
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x000080010127d000 from client 27
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x005C0071
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          Faulty UTCL2 client ID: CB (0x0)
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MORE_FAULTS: 0x1
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          WALKER_ERROR: 0x0
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          PERMISSION_FAULTS: 0x7
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jul 10 00:42:06 inspiron3785 kernel: amdgpu 0000:03:00.0: amdgpu:          RW: 0x1
Jul 10 00:42:10 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=15181772, emitted seq=15181775
Jul 10 00:42:10 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 822 thread Xorg:cs0 pid 1001
Jul 10 00:42:10 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x116a40000 flags=0x0070]
Jul 10 00:42:10 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x116a0c340 flags=0x0070]
Jul 10 00:42:10 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x116a0c380 flags=0x0070]
Jul 10 00:42:10 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x116a0c3a0 flags=0x0070]
Jul 10 00:42:10 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x116a0c3c0 flags=0x0070]
Jul 10 00:42:10 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x116a0c3e0 flags=0x0070]
Jul 10 00:42:10 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x116a0c400 flags=0x0070]
Jul 10 00:42:10 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x116a0c420 flags=0x0070]
Jul 10 00:42:10 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x116a0c440 flags=0x0070]
Jul 10 00:42:10 inspiron3785 kernel: amdgpu 0000:03:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x116a0c460 flags=0x0070]
Jul 10 00:42:12 inspiron3785 kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
Jul 10 00:42:12 inspiron3785 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
Jul 10 00:42:22 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Jul 10 00:42:32 inspiron3785 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered

5.4 caused my internet browser to crash, vivaldi, so I switched back to 5.12 would the experimental 5.13 fix anything?

Try adding iommu=pt to your kernel parameters. According to this, adding iommu=pt to your kernel parameters has solved the issue for others who are using similar hardware on the 5.4 and 5.10 kernels. It is likely that this has not been addressed on the 5.12 and 5.13 kernels, but I’m not totally sure. On that note, I believe that 5.13 in Manjaro is still not on the stable branch; at least it wasn’t a few days ago.

EDIT: What are you doing when this happenes? The log entry

Jul 10 00:42:12 inspiron3785 kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110

seems to indicate you are resuming from hibernation or suspend?

1 Like

Ok I will try that
:crossed_fingers:

I do use the suspend feature so that I can save battery life, is this what is causing my gpu to crash?!

It is one possible reason that came up when I was googling around for solutions. Another idea that was floated was adding amdgpu.noretry=0 the the kernel parameters and reboot, which may also help in this instance, but that would depend if your crash is happening at about the time you come out of suspend. As a test, you could try disabling suspend and just shutting down (kind of a pain) to see if the crash continues to happen.

1 Like

It crashes randomly, when I come out of suspend when I am just messing around or googling.

Then iommu=pt seems to be the one solution that seems to match your issue. I will continue to look around and see if I can find anything else. On the subject of BIOS, I would also see if there is an update. I’m not sure your Inspiron 3785 works like my XPS where you can drop the .exe you download from Dell onto a thumb drive and update directly from the Support Assist / Boot selection menu.

Ok, I will return to dells website and dig a little deeper. Its been 20 minutes and no crashes yet

:crossed_fingers:

will I be able to do the same thing with other updates?

https://www.dell.com/support/kbdoc/en-us/000131486/update-the-dell-bios-in-a-linux-or-ubuntu-environment

Only the bios from my understanding. I’m not sure there is a need to update firmware on anything else.

So I do not need to worry about this?

https://www.dell.com/support/home/en-us/drivers/driversdetails?driverid=tr2dy&oscode=wt64a&productcode=inspiron-17-3785-laptop