Semi-random Crashes On all Manjaro DE's Old/New Kernel (and not Windows so far)

I need some help diagnosing what I think is a hardware issue on my Dell Precision laptop causing crashes that happen both randomly and, e.g., whenever I load up a youtube video on firefox. I haven’t been able to recreate the crashes on Windows yet but this hardware feels like the likely cause because I have an old install that I used last year on kernel 5.10 XFCE Manjaro and I’ve been able to reproduce the crash there.

I’ve been running Manjaro Gnome for a few months and they’ve started happening pretty recently, maybe a month or so. I’ve since tried updating the kernel and tried running a live USB with KDE on it, but neither of these things fixed the issue. Not sure if it’s relevant but the computer did last a few minutes longer watching a youtube version on firefox when I was running the old XFCE install.

Whenever the crash happens the computer powers down and immediately shows the dell (yep) flash logo like it’s restarting or never got the signal that it turned off. When I look at journalctl logs before the crash there’s nothing, I can post this output if called for but am on the Windows install currently. Whenever the system finishes powering up this way the battery is shown as unplugged and will not charge, even though it was plugged in and charging up until the crash. I then have to shut down the computer, unplug and replug the charger, then power the system on and it will continue to be charging. This crash does still happen even when the laptop is on battery power only.

Any tips for diagnosing this would be really appreciated, also if there are any software-side ideas that would be helpful too. I’m open to having it repaired but want as much info as I can going into it to make sure it will solve the problem, whatever the suggestion ends up being from a third-party repair service.

does it only crashes when watching video on firefox?
you should provide logs from when the crash happens:
journalctl -b-1 -p5 --no-pager
also provide formatted output from:
inxi -Fazy

It crashes on google maps on chrome and firefox, it crashes sometimes when starting up games, and sometimes it crashes when I’m just running gvim and a terminal doing some light python coding, or sometimes just dragging windows around or switching workspaces. If I’m playing a game once it’s running for a few minutes it won’t crash for as long as I’m playing.

Oh, I also think there’s an overall trend where if I put the laptop on performance mode it crashes much faster than if it’s on power saving mode.

I reproduced another crash via firefox/youtube. This is 20 seconds of log data before the crash happened, cutting out some logs related to my keyboard and mouse disconnecting for some reason.

>>> journalctl -b-1 -p5 --no-pager
Jun 24 19:05:18 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (**) Option "AccelerationScheme" "none"
Jun 24 19:05:18 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (**) SteelSeries SteelSeries Rival 3 Wireless: (accel) selected scheme none/0
Jun 24 19:05:18 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (**) SteelSeries SteelSeries Rival 3 Wireless: (accel) acceleration factor: 2.000
Jun 24 19:05:18 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (**) SteelSeries SteelSeries Rival 3 Wireless: (accel) acceleration threshold: 4
Jun 24 19:05:18 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) event3  - SteelSeries SteelSeries Rival 3 Wireless: is tagged by udev as: Mouse
Jun 24 19:05:18 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) event3  - SteelSeries SteelSeries Rival 3 Wireless: device is a pointer
Jun 24 19:05:18 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): EDID vendor "SHP", prod id 5329
Jun 24 19:05:18 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Printing DDC gathered Modelines:
Jun 24 19:05:18 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1920x1200"x0.0  154.00  1920 1968 2000 2080  1200 1203 1209 1235 -hsync -vsync (74.0 kHz eP)
Jun 24 19:05:18 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1920x1200"x0.0  123.20  1920 1968 2000 2080  1200 1203 1209 1235 -hsync -vsync (59.2 kHz e)
Jun 24 19:05:19 trevormax-precision5550 gnome-shell[2084]: Can't update stage views actor <unnamed>[<MetaWindowGroup>:0x564601738320] is on because it needs an allocation.
Jun 24 19:05:19 trevormax-precision5550 gnome-shell[2084]: Can't update stage views actor <unnamed>[<MetaWindowActorX11>:0x564603034b50] is on because it needs an allocation.
Jun 24 19:05:19 trevormax-precision5550 gnome-shell[2084]: Can't update stage views actor <unnamed>[<MetaSurfaceActorX11>:0x564603038a30] is on because it needs an allocation.
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): EDID vendor "ACI", prod id 9368
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Using hsync ranges from config file
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Using vrefresh ranges from config file
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Printing DDC gathered Modelines:
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1920x1080"x0.0  148.50  1920 2008 2052 2200  1080 1084 1089 1125 +hsync +vsync (67.5 kHz eP)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 628 +hsync +vsync (37.9 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "800x600"x0.0   36.00  800 824 896 1024  600 601 603 625 +hsync +vsync (35.2 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "640x480"x0.0   31.50  640 656 720 840  480 481 484 500 -hsync -vsync (37.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "640x480"x0.0   31.50  640 664 704 832  480 489 492 520 -hsync -vsync (37.9 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "640x480"x0.0   30.24  640 704 768 864  480 483 486 525 -hsync -vsync (35.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "640x480"x0.0   25.18  640 656 752 800  480 490 492 525 -hsync -vsync (31.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "720x400"x0.0   28.32  720 738 846 900  400 412 414 449 -hsync +vsync (31.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1280x1024"x0.0  135.00  1280 1296 1440 1688  1024 1025 1028 1066 +hsync +vsync (80.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1024x768"x0.0   78.75  1024 1040 1136 1312  768 769 772 800 +hsync +vsync (60.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1024x768"x0.0   75.00  1024 1048 1184 1328  768 771 777 806 -hsync -vsync (56.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 777 806 -hsync -vsync (48.4 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "832x624"x0.0   57.28  832 864 928 1152  624 625 628 667 -hsync -vsync (49.7 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "800x600"x0.0   49.50  800 816 896 1056  600 601 604 625 +hsync +vsync (46.9 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "800x600"x0.0   50.00  800 856 976 1040  600 637 643 666 +hsync +vsync (48.1 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1152x864"x0.0  108.00  1152 1216 1344 1600  864 865 868 900 +hsync +vsync (67.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1280x1024"x0.0  108.00  1280 1328 1440 1688  1024 1025 1028 1066 +hsync +vsync (64.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1280x960"x0.0  108.00  1280 1376 1488 1800  960 961 964 1000 +hsync +vsync (60.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1440x900"x0.0   88.75  1440 1488 1520 1600  900 903 909 926 +hsync -vsync (55.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1600x1200"x0.0  162.00  1600 1664 1856 2160  1200 1201 1204 1250 +hsync +vsync (75.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1680x1050"x0.0  119.00  1680 1728 1760 1840  1050 1053 1059 1080 +hsync -vsync (64.7 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): EDID vendor "ACI", prod id 9368
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Using hsync ranges from config file
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Using vrefresh ranges from config file
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Printing DDC gathered Modelines:
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1920x1080"x0.0  148.50  1920 2008 2052 2200  1080 1084 1089 1125 +hsync +vsync (67.5 kHz eP)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "800x600"x0.0   40.00  800 840 968 1056  600 601 605 628 +hsync +vsync (37.9 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "800x600"x0.0   36.00  800 824 896 1024  600 601 603 625 +hsync +vsync (35.2 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "640x480"x0.0   31.50  640 656 720 840  480 481 484 500 -hsync -vsync (37.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "640x480"x0.0   31.50  640 664 704 832  480 489 492 520 -hsync -vsync (37.9 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "640x480"x0.0   30.24  640 704 768 864  480 483 486 525 -hsync -vsync (35.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "640x480"x0.0   25.18  640 656 752 800  480 490 492 525 -hsync -vsync (31.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "720x400"x0.0   28.32  720 738 846 900  400 412 414 449 -hsync +vsync (31.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1280x1024"x0.0  135.00  1280 1296 1440 1688  1024 1025 1028 1066 +hsync +vsync (80.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1024x768"x0.0   78.75  1024 1040 1136 1312  768 769 772 800 +hsync +vsync (60.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1024x768"x0.0   75.00  1024 1048 1184 1328  768 771 777 806 -hsync -vsync (56.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1024x768"x0.0   65.00  1024 1048 1184 1344  768 771 777 806 -hsync -vsync (48.4 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "832x624"x0.0   57.28  832 864 928 1152  624 625 628 667 -hsync -vsync (49.7 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "800x600"x0.0   49.50  800 816 896 1056  600 601 604 625 +hsync +vsync (46.9 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "800x600"x0.0   50.00  800 856 976 1040  600 637 643 666 +hsync +vsync (48.1 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1152x864"x0.0  108.00  1152 1216 1344 1600  864 865 868 900 +hsync +vsync (67.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1280x1024"x0.0  108.00  1280 1328 1440 1688  1024 1025 1028 1066 +hsync +vsync (64.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1280x960"x0.0  108.00  1280 1376 1488 1800  960 961 964 1000 +hsync +vsync (60.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1440x900"x0.0   88.75  1440 1488 1520 1600  900 903 909 926 +hsync -vsync (55.5 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1600x1200"x0.0  162.00  1600 1664 1856 2160  1200 1201 1204 1250 +hsync +vsync (75.0 kHz e)
Jun 24 19:05:20 trevormax-precision5550 /usr/lib/gdm-x-session[1837]: (II) modeset(0): Modeline "1680x1050"x0.0  119.00  1680 1728 1760 1840  1050 1053 1059 1080 +hsync -vsync (64.7 kHz e)
Jun 24 19:05:26 trevormax-precision5550 nautilus[7182]: Unable to initialize tag manager: SQL logic error
Jun 24 19:05:27 trevormax-precision5550 gnome-calculato[7186]: search-provider.vala:117: Failed to spawn Calculator: Child process killed by signal 9
Jun 24 19:05:28 trevormax-precision5550 nautilus[7182]: Connecting to org.freedesktop.Tracker3.Miner.Files
Jun 24 19:05:29 trevormax-precision5550 gnome-character[7188]: JS LOG: Characters Application started
Jun 24 19:05:39 trevormax-precision5550 gnome-character[7188]: JS LOG: Characters Application exiting
>>> inxi -Fazy
System:
  Kernel: 5.18.0-1-rt11-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
    parameters: BOOT_IMAGE=/boot/vmlinuz-5.18-rt-x86_64
    root=UUID=bc17db34-32bc-449e-a823-73d3b50632b3 rw quiet splash apparmor=1
    security=apparmor resume=UUID=00723683-4aee-4f02-8c01-95a13d81a9fc
    udev.log_priority=3
  Desktop: GNOME v: 42.2 tk: GTK v: 3.24.34 wm: gnome-shell dm: GDM v: 42.0
    Distro: Manjaro Linux base: Arch Linux
Machine:
  Type: Laptop System: Dell product: Precision 5550 v: N/A
    serial: <superuser required> Chassis: type: 10 serial: <superuser required>
  Mobo: Dell model: 00M55X v: A01 serial: <superuser required> UEFI: Dell
    v: 1.11.0 date: 11/12/2021
Battery:
  ID-1: BAT0 charge: 42.3 Wh (100.0%) condition: 42.3/84.3 Wh (50.2%)
    volts: 12.0 min: 11.4 model: BYD DELL M59JH07 type: Li-poly serial: <filter>
    status: full
CPU:
  Info: model: Intel Core i7-10750H bits: 64 type: MT MCP arch: Comet Lake
    gen: core 10 built: 2020 process: Intel 14nm family: 6 model-id: 0xA5 (165)
    stepping: 2 microcode: 0xF0
  Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache:
    L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 1.5 MiB desc: 6x256 KiB
    L3: 12 MiB desc: 1x12 MiB
  Speed (MHz): avg: 879 high: 900 min/max: 800/5000 scaling:
    driver: intel_pstate governor: powersave cores: 1: 900 2: 896 3: 900 4: 900
    5: 897 6: 875 7: 843 8: 848 9: 847 10: 855 11: 890 12: 900 bogomips: 62399
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities:
  Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: spec_store_bypass
    mitigation: Speculative Store Bypass disabled via prctl
  Type: spectre_v1
    mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 status: Vulnerable: eIBRS with unprivileged eBPF
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: Intel CometLake-H GT2 [UHD Graphics] vendor: Dell driver: i915
    v: kernel arch: Gen9.5 process: Intel 14nm built: 2016-20 ports:
    active: DP-2,eDP-1 empty: DP-1,DP-3 bus-ID: 00:02.0 chip-ID: 8086:9bc4
    class-ID: 0300
  Device-2: NVIDIA TU117GLM [Quadro T1000 Mobile] vendor: Dell
    driver: nvidia v: 515.48.07 alternate: nouveau,nvidia_drm non-free: 515.xx+
    status: current (as of 2022-06) arch: Turing process: TSMC 12nm
    built: 2018-22 pcie: gen: 1 speed: 2.5 GT/s lanes: 8 link-max: gen: 3
    speed: 8 GT/s lanes: 16 bus-ID: 01:00.0 chip-ID: 10de:1fb9 class-ID: 0302
  Device-3: Realtek Integrated_Webcam_HD type: USB driver: uvcvideo
    bus-ID: 1-11:6 chip-ID: 0bda:5510 class-ID: fe01 serial: <filter>
  Display: x11 server: X.org v: 1.21.1.3 with: Xwayland v: 22.1.2
    compositor: gnome-shell driver: X: loaded: modesetting,nvidia
    alternate: fbdev,nouveau,nv,vesa gpu: i915 display-ID: :1 screens: 1
  Screen-1: 0 s-res: 3840x1200 s-size: <missing: xdpyinfo>
  Monitor-1: DP-2 pos: primary,left model: Asus VS248 serial: <filter>
    built: 2017 res: 1920x1080 hz: 60 dpi: 92 gamma: 1.2
    size: 531x299mm (20.91x11.77") diag: 609mm (24") ratio: 16:9 modes:
    max: 1920x1080 min: 720x400
  Monitor-2: eDP-1 pos: right model: Sharp 0x14d1 built: 2020 res: 1920x1200
    hz: 60 dpi: 145 gamma: 1.2 size: 336x210mm (13.23x8.27") diag: 396mm (15.6")
    ratio: 16:10 modes: 1920x1200
  OpenGL: renderer: Mesa Intel UHD Graphics (CML GT2) v: 4.6 Mesa 22.1.1
    direct render: Yes
Audio:
  Device-1: Intel Comet Lake PCH cAVS vendor: Dell driver: snd_hda_intel
    v: kernel alternate: snd_soc_skl,snd_sof_pci_intel_cnl bus-ID: 00:1f.3
    chip-ID: 8086:06c8 class-ID: 0403
  Sound Server-1: ALSA v: k5.18.0-1-rt11-MANJARO running: yes
  Sound Server-2: JACK v: 1.9.21 running: no
  Sound Server-3: PulseAudio v: 16.0 running: yes
  Sound Server-4: PipeWire v: 0.3.52 running: yes
Network:
  Device-1: Intel Comet Lake PCH CNVi WiFi driver: iwlwifi v: kernel
    bus-ID: 00:14.3 chip-ID: 8086:06f0 class-ID: 0280
  IF: wlp0s20f3 state: up mac: <filter>
Bluetooth:
  Device-1: Intel AX201 Bluetooth type: USB driver: btusb v: 0.8
    bus-ID: 1-14:8 chip-ID: 8087:0026 class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 1 state: up address: see --recommends
Drives:
  Local Storage: total: 953.88 GiB used: 278.89 GiB (29.2%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/nvme0n1 maj-min: 259:8 vendor: SanDisk model: ADATA SX6000PNP
    size: 476.94 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
    lanes: 4 type: SSD serial: <filter> rev: V9002s45 temp: 32.9 C scheme: GPT
  ID-2: /dev/nvme1n1 maj-min: 259:0 vendor: Intel
    model: SSDPEMKF512G8 NVMe 512GB size: 476.94 GiB block-size: physical: 512 B
    logical: 512 B speed: 31.6 Gb/s lanes: 4 type: SSD serial: <filter>
    rev: 7004 temp: 29.9 C scheme: GPT
Partition:
  ID-1: / raw-size: 442.55 GiB size: 434.54 GiB (98.19%)
    used: 278.89 GiB (64.2%) fs: ext4 dev: /dev/nvme0n1p2 maj-min: 259:10
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 304 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:9
Swap:
  Alert: No swap data was found.
Sensors:
  System Temperatures: cpu: 49.0 C pch: 56.0 C mobo: 41.0 C
  Fan Speeds (RPM): cpu: 0 fan-2: 0
Info:
  Processes: 414 Uptime: 7m wakeups: 3 Memory: 30.98 GiB used: 2.2 GiB (7.1%)
  Init: systemd v: 251 default: graphical tool: systemctl Compilers:
  gcc: 12.1.0 clang: 13.0.1 Packages: pacman: 1438 lib: 419 flatpak: 0
  Shell: Zsh v: 5.9 running-in: gnome-terminal inxi: 3.3.19

well theres really nothing… but it looks like it could be related to graphics…
post output from:
mhwd -l && mhwd -li

Have you run memtest?

@mbb
I did test my memory

@brahma

>>> mhwd -l && mhwd -li
> 0000:01:00.0 (0302:10de:1fb9) Display controller nVidia Corporation:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
video-hybrid-intel-nvidia-prime            2021.12.18               false            PCI
video-hybrid-intel-nvidia-470xx-prime            2021.12.18               false            PCI
          video-nvidia            2021.12.18               false            PCI
    video-nvidia-470xx            2021.12.18               false            PCI
           video-linux            2018.05.04                true            PCI


> 0000:00:02.0 (0300:8086:9bc4) Display controller Intel Corporation:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
video-hybrid-intel-nvidia-prime            2021.12.18               false            PCI
video-hybrid-intel-nvidia-470xx-prime            2021.12.18               false            PCI
           video-linux            2018.05.04                true            PCI
     video-modesetting            2020.01.13                true            PCI
            video-vesa            2017.03.12                true            PCI


> Installed PCI configs:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
     video-modesetting            2020.01.13                true            PCI
video-hybrid-intel-nvidia-prime            2021.12.18               false            PCI


Warning: No installed USB configs!

Didn’t mention earlier I reproduced the crash on the liveUSB running both free and nonfree video drivers.

i have no idea … drivers looks good… wanted to suggest to uninstall nvidia and use only linux drivers, to see if the problem still remains, but since the crash happened also with free drivers, i dont know what to do next…

Thanks for the help. Literally no worries because this still tells me what I need to know - it’s a hardware issue and I can’t diagnose it further in software alone.

but it doesnt happen on windows …
provide output from:
journalctl -b0 -p3 --no-pager

>>> journalctl -b0 -p3 --no-pager
Jun 25 23:59:04 trevormax-precision5550 kernel: Spectre V2 : WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks!
Jun 25 23:59:05 trevormax-precision5550 kernel: psmouse serio1: elantech: elantech_send_cmd query 0x02 failed.
Jun 25 23:59:05 trevormax-precision5550 kernel: psmouse serio1: elantech: failed to query capabilities.
Jun 25 23:59:05 trevormax-precision5550 kernel: 
Jun 25 23:59:07 trevormax-precision5550 kernel: Bluetooth: hci0: Malformed MSFT vendor event: 0x02
Jun 25 23:59:08 trevormax-precision5550 kernel: ucsi_acpi USBC000:00: unknown error 0
Jun 25 23:59:08 trevormax-precision5550 kernel: ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-5)
Jun 25 23:59:10 trevormax-precision5550 kernel: ucsi_acpi USBC000:00: PPM init failed (-110)
Jun 26 00:00:34 trevormax-precision5550 systemd[1]: Timed out waiting for device /dev/disk/by-uuid/00723683-4aee-4f02-8c01-95a13d81a9fc.
Jun 26 00:00:37 trevormax-precision5550 gnome-session-binary[1611]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion 'error == NULL || *error == NULL' failed
Jun 26 00:00:37 trevormax-precision5550 gnome-session-binary[1611]: GLib-GIO-CRITICAL: g_bus_get_sync: assertion 'error == NULL || *error == NULL' failed
Jun 26 00:00:53 trevormax-precision5550 gdm-password][1922]: gkr-pam: unable to locate daemon control file
Jun 26 00:00:55 trevormax-precision5550 systemd[1933]: Failed to start Application launched by gnome-session-binary.
Jun 26 00:01:00 trevormax-precision5550 systemd[1933]: Failed to start Application launched by gnome-session-binary.
Jun 26 00:01:06 trevormax-precision5550 gdm-launch-environment][1558]: GLib-GObject: g_object_unref: assertion 'G_IS_OBJECT (object)' failed

I think this is from my last boot event looking at the time.

It may happen on windows but I just haven’t seen it/it’s harder to cause? I still don’t get why an old manjaro/xfce install that never had crashes would suddenly start crashing as soon as I went back to it after several months too. Feels more like there’s some privileged windows coupling w/ the firmware that lets it avoid the crash.

output from:
find /etc/X11/ -name "*.conf"
and is fastboot disabled, secure disabled in bios?

>>> find /etc/X11/ -name "*.conf"
/etc/X11/xorg.conf.d/00-keyboard.conf
/etc/X11/xorg.conf.d/90-mhwd.conf
/etc/X11/mhwd.d/nvidia.conf

I’ll check fastboot and report back if I still get crashes with an edit. Secure boot is disabled, it needed to be to be able to boot from live USB.
edit no dice after “turning” fastboot off. I put it in quotes because I’m not sure if it was on in the first place, although it may have been.

try installing the 5.4 kernel and see if it happens also with it

I finally tried switching kernel to 5.4. I tested it out last night watching a youtube video on firefox (which caused crashes in the past) and was elated that there was no crash for the several minutes I was watching. Then when I sat down this morning to show my girlfriend that it wasn’t crashing and unpaused the video, it shortly crashed.

That said I was running factorio last night and experiencing hiccups and relatevely poor fps, but after downgrading to 5.4 I’m seeing back to crystal-clean performance. So overall I’m still very happy with this fix, even if it is only a partial fix.

I do see a bunch of temperature events on my journalctl now but none from the time the crash actually happened so not sure it’s relevant.

Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU11: Core temperature above threshold, cpu clock throttled (total events = 218)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU5: Core temperature above threshold, cpu clock throttled (total events = 218)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU11: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU5: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU7: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU8: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU4: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU10: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU9: Package temperature above threshold, cpu clock throttled (total events = 1114)
Jul 09 07:22:13 trevormax-precision5550 kernel: mce: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1114)

mce - machine check errors, are not to be ignored… it says there that your cpu is overheating… were you overclocking your cpu?

Your CPU is too hot.

Try cleaning it a bit, remove all the dust, check if the fans are running, etc.

No, and certainly I’m not doing anything overnight that should be causing an overheat. Looking at the temperature profile the CPU is currently at ~45-50 so not overheating. The fans aren’t running right now because nothing is happening. If I run an infinite loop the cpu does seem to heat up to 100c pretty quickly and the fans don’t seem to turn on all that much. Going to read through the wiki, conveniently there is a section on dell laptop fan control specifically, and see if I can configure the fans to kick on more aggressively.

Could it be that this whole crashing issue is CPU related, if the CPU has been consistently hitting 100 degrees C for the last good while, leading to damage over time?

yes, the overheating is very likely the reason for the crashes… also you mentioned that it happened on the live usb… yes check the fans… also you coudl try a live usb of linux mint for example, and test if the crashes occur there too…

When temperature increases fast without heavy workload there are possible reasons:

  • cpu is disconnected from cooling
  • fan is not running
  • fan can´t move the air (dust)
  • air is not blown out (stays in the pc)

andreas :footprints:

I’m working with the i8kutils and it looks like changing the config settings does not result in changes to fan behavior, even when using BIOS_overriding_fan_control. It feels less like the fans aren’t working effectively and more like the fans just aren’t kicking on to full when they need to, since running windows the fans instantly crank up to full when I’m just on the desktop, therefore I know that sound very well and it’s not happening on linux no matter what I do.

>>> i8kmon -v
i8kmon
config(0)          = {0 0} -1 55 -1 60
config(1)          = {1 1} 50 65 55 70
config(2)          = {2 2} 60 75 65 80
config(3)          = {2 2} 70 128 75 128
config(acpi)       = acpi
config(i8kfan)     = /usr/bin/i8kfan
config(sysconfig)  = /etc/i8kutils/i8kmon.conf
config(t_high)     = 80
config(timeout)    = 2
config(use_conf)   = 1
config(userconfig) = ~/.i8kmon
config(verbose)    = 1
status(ac)         = 0
status(acpi_timer) = 0
status(leftspeed)  = 4900 4900 4900 4900
status(lspeed)     = 0
status(lstate)     = -2
status(lstuck)     = 0
status(nfans)      = 2
status(rightspeed) = 0 3000 6000 10000
status(rspeed)     = 0
status(rstate)     = -2
status(rstuck)     = 0
status(state)      = 0
status(t_high)     = 0
status(t_low)      = 0
status(temp)       = 0
/usr/bin/i8kfan 0 0
1657374945 acpi: Battery 0: Full, 100%
temp, left fan state, right fan state, ac state: 59 0 0 0
temp, left fan state, right fan state, ac state: 59 0 0 0

The last two lines here show that the fan states are [0, 0], which I set to be [4900, 0]. However, they are both off when I look at psensor. That said, I did notice that starting i8kmon when temperature was high ~70 and the fan was off (for some reson) did make the fans kick on and cause the temperature to go down rapidly. So it is turning fans on and off more or less as expected, but doesn’t seem to have left/right granularity or access to any speeds other than ~3100 rpm. Not sure where to go from here other than trying out a different liveUSB which I will give an update on later.

Side note, there is a thermal control section in the BIOS and changing it to the profile that seemed to claim to just crank the fans to max all the time had no effect on behavior in my manjaro install as far as I can tell. Obviously disabling bios control was supposed to disable this but since I wasn’t able to change behavior as expected in linux I wanted to see if trying to change via bios did anything and it didn’t.

Actually, I can debug “hardware” problems by checking if there is similar weird temp behavior in windows. I put quotes around that because dust isn’t hardware, but if something is preventing the fans from doing their job I should see spikes in temp on windows too.