Random system freezes Intel i5 + NVIDIA graphics?

I recently experienced multiple random freezing (keyboard and mouse no longer work and screen freezes) on both XFCE and Budgie running various kernels (5.10 LTS, 5.13). What can be the problem?

1 Like

Though I do not have an AMD GPU I wanted to share that I have had no more freezes since upgrading MESA to the latest version (I forced an upgrade by using the Downgrade option as explained on an AMD GPU thread (system-frequently-crashing-after-gpu-drivers-update/62139)

Since installing 21.1.5 MESA I have no more freezes (Believe 21.1.4 may also workā€¦). Just wanted to share this if it helps people experiencing freezing on intel i5 plus NVIDIA GPU combinations.

1 Like

Ok, after 2 days of running, I have had 2 freezes within the first 5 minutes of booting up. I have the latest linux firmware, Nvidia drivers, Kennel LTS 5.10. I am at a loss as to what this can be. Any help would be appreciated. Freezes also happened on 5.13 (running now 5.13.4.1)

System:
  Kernel: 5.13.4-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0 
  parameters: BOOT_IMAGE=/boot/vmlinuz-5.13-x86_64 
  root=UUID=857f084c-0a80-4130-816a-b684c710b657 rw quiet apparmor=1 
  security=apparmor udev.log_priority=3 
  Desktop: Budgie 10.5.3 info: budgie-panel wm: budgie-wm dm: LightDM 1.30.0 
  Distro: Manjaro Linux base: Arch Linux 
Machine:
  Type: Desktop System: ZOTAC product: ZBOX-EN760 v: XX serial: <filter> 
  Mobo: ZOTAC model: ZBOX-EN760 v: XX serial: <filter> 
  UEFI: American Megatrends v: B248P002 date: 06/17/2014 
Battery:
  Device-1: hidpp_battery_0 model: Logitech Wireless Mouse MX Master 2S 
  serial: <filter> charge: 100% (should be ignored) rechargeable: yes 
  status: Discharging 
Memory:
  RAM: total: 15.59 GiB used: 2.33 GiB (15.0%) 
  RAM Report: permissions: Unable to run dmidecode. Root privileges required. 
CPU:
  Info: Dual Core model: Intel Core i5-4200U bits: 64 type: MT MCP 
  arch: Haswell family: 6 model-id: 45 (69) stepping: 1 microcode: 26 cache: 
  L2: 3 MiB bogomips: 18360 
  Speed: 1596 MHz min/max: 800/2600 MHz Core speeds (MHz): 1: 1596 2: 1593 
  3: 1673 4: 1596 
  Flags: abm acpi aes aperfmperf apic arat arch_perfmon avx avx2 bmi1 bmi2 bts 
  clflush cmov constant_tsc cpuid cpuid_fault cx16 cx8 de ds_cpl dtes64 dtherm 
  dts epb ept ept_ad erms est f16c flexpriority flush_l1d fma fpu fsgsbase 
  fxsr ht ibpb ibrs ida invpcid invpcid_single lahf_lm lm mca mce md_clear mmx 
  monitor movbe msr mtrr nonstop_tsc nopl nx pae pat pbe pcid pclmulqdq pdcm 
  pdpe1gb pebs pge pln pni popcnt pse pse36 pti pts rdrand rdtscp rep_good 
  sdbg sep smep ss ssbd sse sse2 sse4_1 sse4_2 ssse3 stibp syscall tm tm2 
  tpr_shadow tsc tsc_adjust tsc_deadline_timer vme vmx vnmi vpid xsave 
  xsaveopt xtopology xtpr 
  Vulnerabilities: Type: itlb_multihit status: KVM: VMX disabled 
  Type: l1tf 
  mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable 
  Type: mds mitigation: Clear CPU buffers; SMT vulnerable 
  Type: meltdown mitigation: PTI 
  Type: spec_store_bypass 
  mitigation: Speculative Store Bypass disabled via prctl and seccomp 
  Type: spectre_v1 
  mitigation: usercopy/swapgs barriers and __user pointer sanitization 
  Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional, 
  IBRS_FW, STIBP: conditional, RSB filling 
  Type: srbds mitigation: Microcode 
  Type: tsx_async_abort status: Not affected 
Graphics:
  Device-1: NVIDIA GM107M [GeForce GTX 860M] vendor: ZOTAC driver: nvidia 
  v: 470.57.02 alternate: nouveau,nvidia_drm bus-ID: 04:00.0 
  chip-ID: 10de:1392 class-ID: 0300 
  Display: x11 server: X.Org 1.20.11 compositor: budgie-wm driver: 
  loaded: nvidia display-ID: :0 screens: 1 
  Screen-1: 0 s-res: 2560x1440 s-dpi: 96 s-size: 677x381mm (26.7x15.0") 
  s-diag: 777mm (30.6") 
  Monitor-1: HDMI-0 res: 2560x1440 hz: 60 dpi: 118 
  size: 553x311mm (21.8x12.2") diag: 634mm (25") 
  OpenGL: renderer: NVIDIA GeForce GTX 860M/PCIe/SSE2 
  v: 4.6.0 NVIDIA 470.57.02 direct render: Yes 
Audio:
  Device-1: Intel 8 Series HD Audio vendor: ZOTAC driver: snd_hda_intel 
  v: kernel bus-ID: 00:1b.0 chip-ID: 8086:9c20 class-ID: 0403 
  Device-2: NVIDIA GM107 High Definition Audio [GeForce 940MX] vendor: ZOTAC 
  driver: snd_hda_intel v: kernel bus-ID: 04:00.1 chip-ID: 10de:0fbc 
  class-ID: 0403 
  Device-3: Comtrue E1DA #9038D PCM32/384 DSD256 type: USB 
  driver: hid-generic,snd-usb-audio,usbhid bus-ID: 2-2:3 chip-ID: 2fc6:6013 
  class-ID: 0300 serial: <filter> 
  Sound Server-1: ALSA v: k5.13.4-1-MANJARO running: yes 
  Sound Server-2: JACK v: 0.125.0 running: no 
  Sound Server-3: PulseAudio v: 14.2 running: no 
  Sound Server-4: PipeWire v: 0.3.32 running: yes 
Network:
  Device-1: Intel Wireless 3160 driver: iwlwifi v: kernel port: f000 
  bus-ID: 02:00.0 chip-ID: 8086:08b3 class-ID: 0280 
  IF: wlp2s0 state: down mac: <filter> 
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet 
  vendor: ZOTAC driver: r8168 v: 8.049.02-NAPI modules: r8169 port: e000 
  bus-ID: 03:00.0 chip-ID: 10ec:8168 class-ID: 0200 
  IF: enp3s0 state: down mac: <filter> 
  Device-3: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet 
  vendor: ZOTAC driver: r8168 v: 8.049.02-NAPI modules: r8169 port: c000 
  bus-ID: 05:00.0 chip-ID: 10ec:8168 class-ID: 0200 
  IF: enp5s0 state: up speed: 100 Mbps duplex: full mac: <filter> 
  IP v4: <filter> type: noprefixroute scope: global broadcast: <filter> 
  IP v6: <filter> type: dynamic noprefixroute scope: global 
  IP v6: <filter> type: noprefixroute scope: link 
  IF-ID-1: docker0 state: down mac: <filter> 
  IP v4: <filter> scope: global broadcast: <filter> 
  IF-ID-2: tun0 state: unknown speed: 10 Mbps duplex: full mac: N/A 
  IP v4: <filter> type: noprefixroute scope: global 
  IP v6: <filter> virtual: stable-privacy scope: link 
  WAN IP: <filter> 
Bluetooth:
  Device-1: Intel Bluetooth wireless interface type: USB driver: btusb v: 0.8 
  bus-ID: 2-6:7 chip-ID: 8087:07dc class-ID: e001 
  Report: bt-adapter ID: hci0 rfk-id: 1 state: up address: <filter> 
Logical:
  Message: No logical block device data found. 
RAID:
  Message: No RAID data found. 
Drives:
  Local Storage: total: 698.65 GiB used: 53.45 GiB (7.7%) 
  SMART Message: Unable to run smartctl. Root privileges required. 
  ID-1: /dev/sda maj-min: 8:0 vendor: Western Digital 
  model: WDS500G2B0A-00SM50 size: 465.76 GiB block-size: physical: 512 B 
  logical: 512 B speed: 6.0 Gb/s type: SSD serial: <filter> rev: 20WD 
  scheme: GPT 
  ID-2: /dev/sdb maj-min: 8:16 type: USB vendor: Samsung 
  model: Portable SSD T3 size: 232.89 GiB block-size: physical: 512 B 
  logical: 512 B type: SSD serial: <filter> scheme: MBR 
  SMART Message: Unknown USB bridge. Flash drive/Unsupported enclosure? 
  Message: No optical or floppy data found. 
Partition:
  ID-1: / raw-size: 68.61 GiB size: 67.29 GiB (98.07%) used: 53.42 GiB (79.4%) 
  fs: ext4 dev: /dev/sda6 maj-min: 8:6 label: N/A 
  uuid: 857f084c-0a80-4130-816a-b684c710b657 
  ID-2: /boot/efi raw-size: 100 MiB size: 96 MiB (96.00%) 
  used: 25.8 MiB (26.9%) fs: vfat dev: /dev/sda2 maj-min: 8:2 label: N/A 
  uuid: 12D4-94FA 
Swap:
  Alert: No swap data was found. 
Unmounted:
  ID-1: /dev/sda1 maj-min: 8:1 size: 300 MiB fs: ntfs label: Wiederherstellung 
  uuid: 3440D41A40D3E122 
  ID-2: /dev/sda3 maj-min: 8:3 size: 128 MiB fs: <superuser required> 
  label: N/A uuid: N/A 
  ID-3: /dev/sda4 maj-min: 8:4 size: 153.94 GiB fs: ntfs label: N/A 
  uuid: D0B8D959B8D93EA2 
  ID-4: /dev/sda5 maj-min: 8:5 size: 510 MiB fs: ntfs label: N/A 
  uuid: 287284A0728473FA 
  ID-5: /dev/sdb1 maj-min: 8:17 size: 232.88 GiB fs: exfat label: Samsung_T3 
  uuid: 9C6E-2E4D 
USB:
  Hub-1: 1-0:1 info: Full speed (or root) Hub ports: 2 rev: 2.0 
  speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900 
  Hub-2: 1-1:2 info: Intel Integrated Rate Matching Hub ports: 8 rev: 2.0 
  speed: 480 Mb/s chip-ID: 8087:8000 class-ID: 0900 
  Hub-3: 2-0:1 info: Full speed (or root) Hub ports: 9 rev: 2.0 
  speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900 
  Device-1: 2-2:3 info: Comtrue E1DA #9038D PCM32/384 DSD256 type: Audio,HID 
  driver: hid-generic,snd-usb-audio,usbhid interfaces: 4 rev: 2.0 
  speed: 480 Mb/s chip-ID: 2fc6:6013 class-ID: 0300 serial: <filter> 
  Hub-4: 2-4:4 info: Alcor Micro USB Hub ports: 4 rev: 2.0 speed: 480 Mb/s 
  power: 100mA chip-ID: 058f:6254 class-ID: 0900 
  Device-1: 2-4.3:6 info: Logitech Unifying Receiver type: Keyboard,Mouse,HID 
  driver: logitech-djreceiver,usbhid interfaces: 3 rev: 2.0 speed: 12 Mb/s 
  power: 98mA chip-ID: 046d:c52b class-ID: 0300 
  Device-2: 2-4.4:8 info: Holtek type: Keyboard,HID driver: hid-generic,usbhid 
  interfaces: 2 rev: 2.0 speed: 1.5 Mb/s power: 100mA chip-ID: 04d9:a088 
  class-ID: 0300 
  Device-3: 2-5:5 info: Realtek RTS5129 Card Reader Controller 
  type: <vendor specific> driver: rtsx_usb,rtsx_usb_ms,rtsx_usb_sdmmc 
  interfaces: 1 rev: 2.0 speed: 480 Mb/s power: 500mA chip-ID: 0bda:0129 
  class-ID: ff00 serial: <filter> 
  Device-4: 2-6:7 info: Intel Bluetooth wireless interface type: Bluetooth 
  driver: btusb interfaces: 2 rev: 2.0 speed: 12 Mb/s power: 100mA 
  chip-ID: 8087:07dc class-ID: e001 
  Hub-5: 3-0:1 info: Full speed (or root) Hub ports: 4 rev: 3.0 speed: 5 Gb/s 
  chip-ID: 1d6b:0003 class-ID: 0900 
  Device-1: 3-1:2 info: Samsung Portable SSD T3 (MU-PT250B MU-PT500B) 
  type: Mass Storage driver: uas interfaces: 1 rev: 3.0 speed: 5 Gb/s 
  power: 896mA chip-ID: 04e8:61f3 class-ID: 0806 serial: <filter> 
Sensors:
  System Temperatures: cpu: 29.8 C mobo: 27.8 C gpu: nvidia temp: 47 C 
  Fan Speeds (RPM): N/A 
Info:
  Processes: 282 Uptime: 24m wakeups: 4 Init: systemd v: 248 tool: systemctl 
  Compilers: gcc: 11.1.0 clang: 12.0.1 Packages: 1612 pacman: 1564 lib: 353 
  flatpak: 36 snap: 12 Shell: Bash v: 5.1.8 running-in: gnome-terminal 
  inxi: 3.3.06

When the system freezes, nothing works. I cannot SSH into the machine. I have to hard reset.

Suggestions would be appreciated.

I checked journal and dmesg for obvious errors and I could not spot anything of noteā€¦
I have run a memory test with multiple passes and RAM is fine.

1 Like

Depending on how hard your hard reset was, please read this:

and after the next crash:

  • REISUB

  • provide the output to

    journalctl --system --boot=0 --priority=3 | tail --lines=50
    

Thanks a lot. All your tips have been really helpful. I now know what REISUB is :slight_smile: and I just reconfigured grub and tested it. Works. However, when my system freezes normally the keyboard becomes totally unresponsive (caps lock for example does not work and no lights), so I am curious to see if REISUB will work. I have read reddit forums where numerous people have had random freezes with NVIDIA drivers where the keyboard also becomes unresponsive so I pretty sure it is a driver issueā€¦

I should add that some of the crashes happened when using Thunderbird (at least 3 of the recent crashes) and Thunderbird is built using Firefox functionality with hardware acceleration on as default. Another crash happened when starting Firefox browser. I have for now disabled all hardware acceleration on email/browsers but this is not a permanent solution.

Another day, another crashā€¦REISUSB did not work. Keyboard totally frozen

I ran journalctl --system --boot=0 --priority=3 | tail --lines=50
Also did boot=-1

Both logs had identical outputs.

Aug 03 07:32:37 cm-zboxen760 kernel:
Aug 03 07:32:41 cm-zboxen760 systemd[662]: PAM failed: Authentication service cannot retrieve authentication info
Aug 03 07:32:41 XXXX systemd[662]: user@971.service: Failed to set up PAM session: Operation not permitted
Aug 03 07:32:41 XXX systemd[662]: user@971.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Aug 03 07:32:41 XXX systemd[1]: Failed to start User Manager for UID 971.
Aug 03 07:32:56 XXX lightdm[1200]: gkr-pam: unable to locate daemon control file

This is it. Almost every day I get a crash. Latest updates installed. Crash happens on both 5.10 and 5.13 kernels. So frustrating. This time the crash happened when opening Brave (a chromium based browser and acceleration was turned on, but crashes happen when it is off as well).

System was stable for about 5 days and again crashed. Running Librecad this time caused the crash.

I come to this forum from time to time to check if this problem is already solved, andā€¦ well I see that is not, at least if your problem is the same as mine.
The problem has been around for almost a year already, and still no solution.
The thing is that the Arch kernel, from 5.2 or so, freezes randomly in Haswell CPUs (at least, that ones).
This is probably because of the scheduler or something like that (the thing that changes the CPU power from ā€œenergyā€ to ā€œperformanceā€) which is broken for the Haswell CPUs afaik.
The solution is to use 4.19 kernel and hope to have an usable kernel in the future before this kernel becomes too old.
Another solution could be to set the CPU always to performance, I donā€™t like to do that, so what I do is to open something in Lutris so the Lutris scripts set the CPU to performance automatically until you close Lutris, itā€™s a temporal solution but it works for me (no more freezes). There are probably better ways to do so, but idk them.
Also all of this is supposing that your problem is the same as mine, I canā€™t guarantee that butā€¦ freezes + firefox + no REISUB + Haswell + Nvidia driversā€¦ is the same situation for me.

Iā€™m sorry if my explanations were not ā€œtechnicalā€ but Iā€™m not a Linux proā€¦

1 Like

Why donā€™t you install TLP or slimbook battery :wink: ?

True, thatā€™s the solution I didnā€™t know XD.
I mean, I knew about that programs, but I never thought of using them on a desktop PC, but for this purpose it would be useful, thanks.

Thanks for the suggestions. Really appreciate itā€¦

I tried the CPU setting. Makes no difference. Still have random freezes. This time though the machine will restart after a few minutes on its own (but not gently, like a hard reset). Before it just froze.

Iā€™m using Lenovo Ideapad Gaming 3 (AMD CPU with integrated GPU and NVDIA VGA) and notice that it always freeze, usually first 10 or 15 minutes after booting up. I notice that the problem will not occur if Iā€™m using only integrated GPU (config this by using optimus-manager).

This is the log collection from ~/.xsession_errors. Iā€™m using XFCE by the way

[1534:1:1026/173153.981941:ERROR:command_buffer_proxy_impl.cc(328)] GPU state invalid after WaitForGetOffsetInRange.
[1122:1122:1026/173154.014048:ERROR:gpu_process_host.cc(979)] GPU process exited unexpectedly: exit_code=512
[16392:16392:1026/173154.387588:ERROR:sandbox_linux.cc(374)] InitializeSandbox() called with multiple threads in process gpu-process.