System freeze while gaming

Hello,

Since a few days my system freezes randomly when I play different games.
The problem was already present but relatively anecdotal, however it is now very (too) frequent to be comfortable.
After searching the system log I can’t see anything that jumps out at me.
I thought it might be overheating, but after checking, the CPU is running at 85/87°C, with the occasional 90°C rise.
What’s more, the system responds to SysREQ.
I also thought of a mouse I’d just changed, but the problem also occurs without one.
What’s the best way to find the source of the problem?

Thank you in advance for your help.

Welcome to the forum! :wave:

While that might be normal for a gaming laptop, that’s definitely a bit high for a desktop.

Compiling source code with all threads maxed out is much more intensive than gaming and I never see temperatures higher than about 60°C on my desktop build machine.

Neither can we.

Please see:

1 Like

My apologies,

Yes, it is indeed a laptop.

You can see on this log obtained via sudo journalctl --boot -3 > ~/Bureau/syslog-3.txt at line 2125 the sysrq command, the freeze occurred relatively shortly before.
journalctl --boot=-3 --priority=3 --catalog --no-pager produces the following errors:

mai 27 19:53:43 GNDE-CLEVO-P960RN systemd-udevd[357]: /etc/udev/rules.d/40-libsane.rules:26 GOTO="libsane_rules_end" has no matching label, ignoring.
mai 27 19:53:43 GNDE-CLEVO-P960RN systemd-udevd[357]: /etc/udev/rules.d/S99-2000S1.rules:26 GOTO="libsane_rules_end" has no matching label, ignoring.
mai 27 19:53:44 GNDE-CLEVO-P960RN kernel: psmouse serio2: synaptics: Unable to query device: -5
mai 27 19:53:44 GNDE-CLEVO-P960RN kernel: 
mai 27 19:53:59 GNDE-CLEVO-P960RN kernel: snd_hda_intel 0000:00:1f.3: azx_get_response timeout, switching to single_cmd mode: last cmd=0x20170503
mai 27 19:54:24 GNDE-CLEVO-P960RN bluetoothd[681]: src/gatt-database.c:database_add_chrc() Failed to create characteristic entry in database
mai 27 19:54:24 GNDE-CLEVO-P960RN bluetoothd[681]: src/gatt-database.c:database_add_service() Failed to add characteristic
mai 27 19:54:24 GNDE-CLEVO-P960RN bluetoothd[681]: src/gatt-database.c:database_add_app() Failed to add service
mai 27 19:54:24 GNDE-CLEVO-P960RN bluetoothd[681]: src/gatt-database.c:client_ready_cb() Failed to create GATT service entry in local database
mai 27 19:57:22 GNDE-CLEVO-P960RN systemd-udevd[3783]: /etc/udev/rules.d/40-libsane.rules:26 GOTO="libsane_rules_end" has no matching label, ignoring.
mai 27 19:57:22 GNDE-CLEVO-P960RN systemd-udevd[3783]: /etc/udev/rules.d/S99-2000S1.rules:26 GOTO="libsane_rules_end" has no matching label, ignoring.
mai 27 19:57:24 GNDE-CLEVO-P960RN systemd-udevd[3881]: /etc/udev/rules.d/40-libsane.rules:26 GOTO="libsane_rules_end" has no matching label, ignoring.
mai 27 19:57:24 GNDE-CLEVO-P960RN systemd-udevd[3881]: /etc/udev/rules.d/S99-2000S1.rules:26 GOTO="libsane_rules_end" has no matching label, ignoring.

The logs are those dating from 3 boots in order to have the sysrq as a reference, I simply forced the redamarrage on the last freeze.
Please let me know if I can provide you with any useful logs.

What I tried:
I increased the vm.max_map_count to 2147483642, without solving the problem.
I’ve added an 8GB swap file.
I’ve installed thermald.

System info
Drivers

Always post your inxi when you looking for help:

inxi --admin --verbosity=5 --filter --no-host --width

What is your GPU Temp, before you freeze?

Here is my system info:

System:
  Kernel: 6.9.0-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/boot/vmlinuz-6.9-x86_64
    root=UUID=3da5b175-22b6-48c6-a09d-a0dfe77f81b3 rw quiet splash
    udev.log_priority=3
  Desktop: KDE Plasma v: 6.0.4 tk: Qt v: N/A info: frameworks v: 6.1.0
    wm: kwin_x11 vt: 2 dm: SDDM Distro: Manjaro base: Arch Linux
Machine:
  Type: Laptop System: Notebook product: P95_96_97Ex,Rx v: N/A
    serial: <superuser required> Chassis: type: 10 serial: <superuser required>
  Mobo: Notebook model: P95_96_97Ex,Rx serial: <superuser required>
    uuid: <superuser required> UEFI: INSYDE v: 1.07.01 date: 02/18/2019
Battery:
  ID-1: BAT0 charge: 49.8 Wh (100.0%) condition: 49.8/56.2 Wh (88.6%)
    volts: 16.9 min: 15.2 model: Notebook BAT type: Li-ion serial: <filter>
    status: full
  ID-2: hidpp_battery_0 charge: 100% condition: N/A volts: 4.2 min: N/A
    model: Logitech G903 LIGHTSPEED Wireless Gaming Mouse w/ HERO type: N/A
    serial: <filter> status: full
Memory:
  System RAM: total: 32 GiB available: 31.06 GiB used: 3.52 GiB (11.3%)
  Message: For most reliable report, use superuser + dmidecode.
  Array-1: capacity: 32 GiB slots: 2 modules: 2 EC: None
    max-module-size: 16 GiB note: est.
  Device-1: ChannelA-DIMM0 type: DDR4 detail: synchronous size: 16 GiB
    speed: 2667 MT/s volts: curr: 1 min: 2 max: 2 width (bits): data: 64
    total: 64 manufacturer: Corsair part-no: CM4X16GE2666C18S4 serial: N/A
  Device-2: ChannelB-DIMM0 type: DDR4 detail: synchronous size: 16 GiB
    speed: 2667 MT/s volts: curr: 1 min: 2 max: 2 width (bits): data: 64
    total: 64 manufacturer: Corsair part-no: CM4X16GE2666C18S4 serial: N/A
CPU:
  Info: model: Intel Core i7-9750H bits: 64 type: MT MCP arch: Coffee Lake
    gen: core 9 level: v3 note: check built: 2018 process: Intel 14nm family: 6
    model-id: 0x9E (158) stepping: 0xA (10) microcode: 0xF6
  Topology: cpus: 1x cores: 6 tpc: 2 threads: 12 smt: enabled cache:
    L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 1.5 MiB desc: 6x256 KiB
    L3: 12 MiB desc: 1x12 MiB
  Speed (MHz): avg: 850 high: 900 min/max: 800/4500 scaling:
    driver: intel_pstate governor: powersave cores: 1: 800 2: 800 3: 900 4: 900
    5: 800 6: 900 7: 800 8: 900 9: 800 10: 900 11: 900 12: 800 bogomips: 62431
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities:
  Type: gather_data_sampling mitigation: Microcode
  Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT
    vulnerable
  Type: mds mitigation: Clear CPU buffers; SMT vulnerable
  Type: meltdown mitigation: PTI
  Type: mmio_stale_data mitigation: Clear CPU buffers; SMT vulnerable
  Type: reg_file_data_sampling status: Not affected
  Type: retbleed mitigation: IBRS
  Type: spec_rstack_overflow status: Not affected
  Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
    prctl
  Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
    sanitization
  Type: spectre_v2 mitigation: IBRS; IBPB: conditional; STIBP: conditional;
    RSB filling; PBRSB-eIBRS: Not affected; BHI: Not affected
  Type: srbds mitigation: Microcode
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: Intel CoffeeLake-H GT2 [UHD Graphics 630] vendor: CLEVO/KAPOK
    driver: i915 v: kernel arch: Gen-9.5 process: Intel 14nm built: 2016-20
    ports: active: eDP-1 empty: none bus-ID: 00:02.0 chip-ID: 8086:3e9b
    class-ID: 0300
  Device-2: NVIDIA TU104M [GeForce RTX 2080 Mobile] vendor: CLEVO/KAPOK
    driver: nvidia v: 550.78 alternate: nouveau,nvidia_drm non-free: 550.xx+
    status: current (as of 2024-04; EOL~2026-12-xx) arch: Turing code: TUxxx
    process: TSMC 12nm FF built: 2018-2022 pcie: gen: 3 speed: 8 GT/s lanes: 16
    bus-ID: 01:00.0 chip-ID: 10de:1e90 class-ID: 0300
  Device-3: Bison BisonCam NB Pro driver: uvcvideo type: USB rev: 2.0
    speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 1-8:3 chip-ID: 5986:9102
    class-ID: 0e02
  Display: x11 server: X.Org v: 21.1.13 with: Xwayland v: 23.2.6
    compositor: kwin_x11 driver: X: loaded: modesetting,nvidia
    alternate: fbdev,nouveau,nv,vesa dri: iris gpu: i915 display-ID: :0
    screens: 1
  Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
    s-diag: 582mm (22.93")
  Monitor-1: eDP-1 model: ChiMei InnoLux 0x1602 built: 2018 res: 1920x1080
    hz: 144 dpi: 137 gamma: 1.2 size: 355x199mm (13.98x7.83") diag: 407mm (16")
    ratio: 16:9 modes: 1920x1080
  API: EGL v: 1.5 hw: drv: intel iris drv: nvidia platforms: device: 0
    drv: nvidia device: 1 drv: iris device: 3 drv: swrast gbm: drv: kms_swrast
    surfaceless: drv: nvidia x11: drv: iris inactive: wayland,device-2
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: intel mesa v: 24.0.6-manjaro1.1
    glx-v: 1.4 direct-render: yes renderer: Mesa Intel UHD Graphics 630 (CFL GT2)
    device-ID: 8086:3e9b memory: 30.33 GiB unified: yes
  API: Vulkan v: 1.3.279 layers: 10 device: 0 type: discrete-gpu name: NVIDIA
    GeForce RTX 2080 with Max-Q Design driver: nvidia v: 550.78
    device-ID: 10de:1e90 surfaces: xcb,xlib
Audio:
  Device-1: Intel Cannon Lake PCH cAVS vendor: CLEVO/KAPOK
    driver: snd_hda_intel v: kernel alternate: snd_soc_skl, snd_soc_avs,
    snd_sof_pci_intel_cnl bus-ID: 00:1f.3 chip-ID: 8086:a348 class-ID: 0403
  Device-2: NVIDIA TU104 HD Audio vendor: CLEVO/KAPOK driver: snd_hda_intel
    v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 01:00.1
    chip-ID: 10de:10f8 class-ID: 0403
  API: ALSA v: k6.9.0-1-MANJARO status: kernel-api with: aoss
    type: oss-emulator tools: alsactl,alsamixer,amixer
  Server-1: JACK v: 1.9.22 status: off tools: N/A
  Server-2: PipeWire v: 1.0.5 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Intel Wi-Fi 5 Wireless-AC 9x6x [Thunder Peak] driver: iwlwifi
    v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 bus-ID: 08:00.0
    chip-ID: 8086:2526 class-ID: 0280
  IF: wlp8s0 state: up mac: <filter>
  Device-2: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
    vendor: CLEVO/KAPOK driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s
    lanes: 1 port: 3000 bus-ID: 09:00.0 chip-ID: 10ec:8168 class-ID: 0200
  IF: enp9s0 state: down mac: <filter>
  Info: services: NetworkManager, systemd-timesyncd, wpa_supplicant
Bluetooth:
  Device-1: Intel Wireless-AC 9260 Bluetooth Adapter driver: btusb v: 0.8
    type: USB rev: 2.0 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-14:5
    chip-ID: 8087:0025 class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 0 state: up address: see --recommends
Drives:
  Local Storage: total: 1.59 TiB used: 272.11 GiB (16.7%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/nvme0n1 maj-min: 259:2 vendor: Western Digital
    model: WDS500G3X0C-00SJG0 size: 465.76 GiB block-size: physical: 512 B
    logical: 512 B speed: 31.6 Gb/s lanes: 4 tech: SSD serial: <filter>
    fw-rev: 111110WD temp: 54.9 C scheme: GPT
  ID-2: /dev/nvme1n1 maj-min: 259:0 vendor: Seagate model: WDS250G3X0C-00SJG0
    size: 232.89 GiB block-size: physical: 512 B logical: 512 B speed: 31.6 Gb/s
    lanes: 4 tech: SSD serial: <filter> fw-rev: 102000WD temp: 62.9 C
    scheme: GPT
  ID-3: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 860 QVO 1TB
    size: 931.51 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    tech: SSD serial: <filter> fw-rev: 1B6Q scheme: GPT
  Message: No optical or floppy data found.
Partition:
  ID-1: / raw-size: 465.46 GiB size: 457.09 GiB (98.20%) used: 38.71 GiB (8.5%)
    fs: ext4 dev: /dev/nvme0n1p2 maj-min: 259:4 label: N/A
    uuid: 3da5b175-22b6-48c6-a09d-a0dfe77f81b3
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 296 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:3 label: N/A
    uuid: 9FB6-E306
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default) zswap: yes
    compressor: zstd max-pool: 20%
  ID-1: swap-1 type: file size: 8 GiB used: 0 KiB (0.0%) priority: -2
    file: /swapfile
Sensors:
  System Temperatures: cpu: 61.0 C pch: 74.0 C mobo: N/A
  Fan Speeds (rpm): N/A
Info:
  Processes: 318 Power: uptime: 8m states: freeze,mem,disk suspend: deep
    avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
    suspend, test_resume image: 12.39 GiB services: org_kde_powerdevil,
    power-profiles-daemon, thermald, upowerd Init: systemd v: 255
    default: graphical tool: systemctl
  Packages: pm: pacman pkgs: 1426 libs: 437 tools: pamac pm: flatpak pkgs: 0
    Compilers: gcc: 13.2.1 Shell: Zsh v: 5.9 default: Bash v: 5.2.26
    running-in: yakuake inxi: 3.3.34

For the GPU, it climbs to 80/81°C and then seems to stabilize.

Here’s the csv log file of the last freeze, towards the end you can see all the values stagnating too. I don’t know whether this is due to overheating or to the freeze itself, since the pre-freeze temperatures have already been exceeded earlier and the system seems to be able to compensate by lowering the CPU and GPU clocks.

UPDATE: After a new test, the system froze very quickly twice in a row, CPU:86°C GPU: 75°C
For these tests I switched to high performance mode in the power management profile, previously set to balanced by default.
Changing to energy-saving mode gives the same result, with even lower temperatures.

What do you think?

Unfortunately the log and other other info you’ve posted doesn’t show any obvious crash or other cause. Diagnosing this kind of freeze is going to be a matter of trial and error. Some things that I’d try;

  • Run memtest86 to check for RAM errors

  • I notice you’re using Proton GE, try the official Proton 9.0

  • Try a different kernel like 6.6 LTS

Woow that’s hot. I mean Intel just showed Tjunction on 100°, Source:

https://ark.intel.com/content/www/us/en/ark/products/191045/intel-core-i7-9750h-processor-12m-cache-up-to-4-50-ghz.html

But 86° is still pretty hot. How old is your Laptop? Maybe clean the case and remove dust? The summer is coming, but maybe its related to the updates around your system… who knows. Everything is possible… if you have a older timeshift snapshots, you could create a timeshift snapshot right now and rollback to the older snapshot only to see if a crash accured on the older packages.

But Lavenders options are not bad either, it could be Ram, it could be the Kernel or even the Proton version.

81° is already GPU Limit, its downclocking from there. But since its crashen even with 75° on the GPU Core im not sure if its the GPU. I mean its even still possible it could be the GPU “Hot Spott” temp. Which is maybe/probably 105°, but this also would let your GPU Fan’s runs on Max RPM, so you would hear that.

All i can say, you have a general temp problem anyways, even if its related to your unstable system now or not. My 2080Ti on my PC never going up above 75° and from MSI Afterburner Times under Windows i know 81° is GPU Limit and activated the downclocking Method from Nvidia GPU Boost 3.0

How much lower? Since your PC crashed with 86° CPU, its a good chance that it could be the CPU… downclocking or repaste Thermalpaste on the CPU, maybe could help if removing dust is not enough.

Was your Laptop stable in the past, under Manjaro? And if yes, for how long?

Is your Room much warmer since the last few weeks with a stable system?

I can also recommend to use a big book like a atlas for example, under your Laptop for max ventilation.

1 Like

The test does not show any errors after one pass

Same issue with Proton 9.0

No more luck here.

My laptop is about 5 years old. I’ve just dusted it off, but I don’t think that’s the problem. Since the very beginning I’ve always seen similar temperatures, I was worried at first then I realized there wasn’t much I could do about it other than raise it.
Unfortunately I don’t have a snapshot, I installed Manjaro on it about a week ago.

The freeze doesn’t trigger the fans to full speed.
The last test carried out after dust removal showed a CPU temperature of 88°C and a GPU temperature of 67°C before freezing. Earlier in the test, the CPU had risen to 91°C. Yes, it’s hot, but that’s always been more or less the case.

For the test carried out in energy-saving mode, the freeze occurred when the CPU was at 57°C and the GPU at 60°C. However, performance was much lower. This test really makes me think that the origin of the problem is not the system temperature.

I’ve already used this laptop with Windows and Arch. I’ve had a few bluescreens with Windows but I don’t remember any problems with Arch.

I’m carrying out the tests in my living room, which is currently at 20.5°C.

I use a NotePal U3 Plus to raise the rear of the laptop. I don’t use the fans though.

Perhaps it’s a Proton/WINE problem, as the only two games tested so far use this conversion layer. I’m going to try a native Linux game (preferably one as greedy as this).
I also noticed that the freeze only fixes the image immediately, while the sound continues to run normally for a few seconds.

Update:
Metro Exodus (native), high temp, freeze
Ori and the blind forest (proton), low temp, no freeze so far
So, maybe it’s temp related but very random with no way to actually prove it…

Update:
Ghost of tshushima, CPU: 55°C GPU: 49°C, CPU max frequency lock at 2,65Ghz, freeze in first 15 seconds (this already happen before without frequency lock).
Ghost of tshushima, CPU: 80°C GPU: 70°C, CPU max frequency lock at 2,65Ghz, freeze after 5 minutes.
Sometimes it takes tens of minutes, or even hours, before freezing occurs.

Sensors:
  System Temperatures: cpu: 61.0 C pch: 74.0 C mobo: N/A
  Fan Speeds (rpm): N/A

PCH seems to be too hot. It could be the one getting overheated.

I managed to obtain a log that seems to contain some interesting elements.
Indeed, there’s a big warning block and a few error lines towards the end of the log.
I don’t understand in detail what happened, but obviously a problem with my screen not being connected to an adapter.

This is another powerful Clevo ‘gaming laptop’, sold under various brands, eg Schenker, Tuxedo, npb, PC specialist etc.
In general:

  • they are running very hot when maxed out and they are noisy at default settings
  • they can be controlled (fans/power-modes) on linux via a software suit (check brand website or search pamac for 'tuxedo/clevo/yourbrand pkg, check compatibility with your model before install)
  • they are easy to open up and clean out and they come with decent manuals

As the issue seems to become progressively worse I’d guess it needs a clean out first. Afterwards temps usually drop by about 10-15C. I have to do this roughly every 3 months on my tuxedo.

1 Like

Already done, but temperatures remain more or less the same.

I launched my laptop under Arch to test, and I also had a freeze.

Although these occur even when temperatures are low, it seems that this is really the problem here.
Is there no way or log file that indicates this kind of problem?

What bothers me most is that I haven’t had this kind of problem with similar temperatures before. I don’t understand why it should be a problem now, especially after the spring cleaning I’ve just done.

The fact that it also happens on a different distro, and at low temperatures, does seem indicate some kind of hardware fault. Might be worth testing on a distro like Ubuntu or Fedora that isn’t Arch-based.

1 Like

Eureka!

While testing different video drivers, I came to switch to xf86-video-intel instead of modesetting and nvidia-beta.
Since then, no more problems.
I can’t be 100% sure without further testing, but I think it’s the intel driver that solves the problem. I used to use this driver exclusively without any problems.
The arch wiki also states that xf86-video-intel supports hardware up to 9th generation and my processor is 9th generation.

Thank you all for your help.

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.