System shuts down abruptly without warning

pch_cometlake-virtual-0
temp1:        +86.0°C

Platform controller hub temps can be a little high but considering your processor cores are in the mid fifties, that temp looks excessive. Since you’re already in contact with Tuxedo’s support I’d definitely mention that. Other things to consider:

  • Are you running any ultra-low latency/high encoding rate audio defaults for pipewire/pulse? If so, bring them back to defaults.
  • Try powertop (maybe with powertop-auto-tune from the aur) to try to get those readings down.
  • Again, check bios settings for anything related to pci bus speed, intel chipset speed/power, cpu/ram overclocking/undervolting etc.
1 Like

This package appears to be abandoned.

This is more promising, I would check for Intel Turbo Boost Technology, usually is enabled by default. If there is an option to disable it in your firmware it’s worth to verify the effect on temperatures / system performance.

@soundofthunder

Thank you. Ventoy was a great recommendation! That is such an easy way to deal with ISO files. The Samsung one started without issues in the grub2 mode. Unfortunately, the update tool did not recognise any supported SSDs. I will check with Tuxedo support again to hear if they have any ideas.

@6x12 @Wollie

I will mention this high temperature to support as well. Thanks! And, I have not done any changes to the default audio settings for pipewire/pulse.

I went digging around in the BIOS, and found a setting for Intel Turbo Boost Technology that I have now disabled.


The system just did another of its unannounced reboots. When I turned it back on, after about 10 minutes, I once more ran sensors. The temperatures had increased a lot since last time I checked. pch_cometlake-virtual-0 read +95.0°C.

$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +66.0°C  (high = +82.0°C, crit = +100.0°C)
Core 0:        +60.0°C  (high = +82.0°C, crit = +100.0°C)
Core 1:        +64.0°C  (high = +82.0°C, crit = +100.0°C)
Core 2:        +61.0°C  (high = +82.0°C, crit = +100.0°C)
Core 3:        +62.0°C  (high = +82.0°C, crit = +100.0°C)
Core 4:        +61.0°C  (high = +82.0°C, crit = +100.0°C)
Core 5:        +66.0°C  (high = +82.0°C, crit = +100.0°C)
Core 6:        +64.0°C  (high = +82.0°C, crit = +100.0°C)
Core 7:        +63.0°C  (high = +82.0°C, crit = +100.0°C)
Core 8:        +61.0°C  (high = +82.0°C, crit = +100.0°C)
Core 9:        +62.0°C  (high = +82.0°C, crit = +100.0°C)

pch_cometlake-virtual-0
Adapter: Virtual device
temp1:        +95.0°C  

nvme-pci-0700
Adapter: PCI adapter
Composite:    +67.8°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +67.8°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +71.8°C  (low  = -273.1°C, high = +65261.8°C)

BAT0-acpi-0
Adapter: ACPI interface
in0:          16.27 V  
curr1:         0.00 A  

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +70.0°C  

nvme-pci-4300
Adapter: PCI adapter
Composite:    +63.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +63.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +65.8°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-0600
Adapter: PCI adapter
Composite:    +69.8°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +69.8°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +61.9°C  (low  = -273.1°C, high = +65261.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +65.0°C  

Five minutes later I re-ran sensors. Then I got the following reading, that does not look good:

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +97.0°C  

Now, the system has been running for 40 minutes, and the temperatures are back at 50-60, except for pch_cometlake-virtual-0 which reads 85°C (despite having disabled “Intel Turbo Boost Technology”). When I search for “pch_cometlake-virtual-0” it looks like there are more having a problem with high temperatures.

I will hear what Tuxedo support say about these findings as well.


I have now created a cron job, to log the temperatures. This way I can check what the temperatures read before next shutdown.

*/5 * * * * date >> /home/qqq/log/sensors.txt && sensors >> /home/qqq/log/sensors.txt

Yes, the Intel homepage is dead but it’s still in the official repos. I mentioned that especially since I found this (slightly enthusiastic) reference to a fix: “Powertop is fantastic! From 82 to 58 degrees in just half a minute!”.

@mzuniga I’d dig deeper in those “Advanced chipset control” features, try disabling some or one-by-one, maybe in conjunction with installing a power-managing software. There are other power-managing and cpu frequency scaling packages that can handle those features in case you have to disable them in bios.

I used an older 3-drive Clevo gaming laptop for audio recording and I had issues with heat/fan noise in the past, at one stage removed the msata drive and the wifi card to get to grips with it. Got best results by removing tlp and installing auto-cpufreq.

1 Like

I appreciate your inputs, @6x12. Thank you.

Digging around I realised that my system was locked in a high performance mode. That explains why also I have had a lot of issues with annoying fan noise, and the high temperatures. I tried out the Tuxedo Control Center. Several features are missing on Manjaro, but I can alter the profile / power mode through that tool.

This was the previous active profile:

Now, I am trying out a predefined mode named “Cool and breezy”:

The notebook has been running, doing different tasks, for five hours, and the temperature now looks like this:

$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +44.0°C  (high = +82.0°C, crit = +100.0°C)
Core 0:        +42.0°C  (high = +82.0°C, crit = +100.0°C)
Core 1:        +43.0°C  (high = +82.0°C, crit = +100.0°C)
Core 2:        +42.0°C  (high = +82.0°C, crit = +100.0°C)
Core 3:        +43.0°C  (high = +82.0°C, crit = +100.0°C)
Core 4:        +43.0°C  (high = +82.0°C, crit = +100.0°C)
Core 5:        +43.0°C  (high = +82.0°C, crit = +100.0°C)
Core 6:        +42.0°C  (high = +82.0°C, crit = +100.0°C)
Core 7:        +43.0°C  (high = +82.0°C, crit = +100.0°C)
Core 8:        +43.0°C  (high = +82.0°C, crit = +100.0°C)
Core 9:        +42.0°C  (high = +82.0°C, crit = +100.0°C)

pch_cometlake-virtual-0
Adapter: Virtual device
temp1:        +65.0°C  

nvme-pci-4300
Adapter: PCI adapter
Composite:    +49.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +49.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +49.9°C  (low  = -273.1°C, high = +65261.8°C)

BAT0-acpi-0
Adapter: ACPI interface
in0:          16.27 V  
curr1:         0.00 A  

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +55.0°C  

nvme-pci-0700
Adapter: PCI adapter
Composite:    +51.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +51.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +46.9°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-0600
Adapter: PCI adapter
Composite:    +55.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +55.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +46.9°C  (low  = -273.1°C, high = +65261.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +45.0°C  

I will test this profile for a few days, to see if the system still shuts down on its own. If it does not help I will keep checking out the BIOS options, and look at powertop.

3 Likes

Just wanted to leave an update. After I changed the power mode to “Cool and breezy”, the temperatures have stayed low. And, the system has been running for more than eight days straight without rebooting or any other issues (things are just a lot slower, and more silent, than what I am used to). So, this seems to point to overheating as the culprit in this issue.

Tuxedo support still asks me to upgrade the firmware of the main SSD, the Samsung 980 Pro, as the firmware I am running is said to be known for causing reboot issues. Unfortunately, the Samsung tool doesn’t recognise my drive, so I have not been able to upgrade. Maybe the Samsung tool gets confused when there is more than one drive present. I will try to disconnect the two extra SSDs and only leave the main one when I boot into the Samsung firmware tool. Hopefully that will help.

$ neofetch 
██████████████████  ████████   poq@tuxwarrior 
██████████████████  ████████   -------------- 
██████████████████  ████████   OS: Manjaro Linux x86_64 
██████████████████  ████████   Host: TUXEDO Book XUX7 Gen11 
████████            ████████   Kernel: 6.6.8-2-MANJARO 
████████  ████████  ████████   Uptime: 8 days, 9 hours, 57 mins 
████████  ████████  ████████   Packages: 1457 (pacman) 
████████  ████████  ████████   Shell: bash 5.2.21 
████████  ████████  ████████   Resolution: 3840x2160 
████████  ████████  ████████   WM: i3 
████████  ████████  ████████   Theme: Sweet-Dark-v40 [GTK2/3] 
████████  ████████  ████████   Icons: candy-icons [GTK2/3] 
████████  ████████  ████████   Terminal: urxvt 
████████  ████████  ████████   Terminal Font: NotoSansMono-Light 
                               CPU: Intel i9-10900K (20) @ 1.850GHz 
                               GPU: NVIDIA GeForce RTX 2080 SUPER Mobile / Max-Q 
                               Memory: 16949MiB / 128580MiB 
$ sensors
coretemp-isa-0000
Adapter: ISA adapter
Package id 0:  +49.0°C  (high = +82.0°C, crit = +100.0°C)
Core 0:        +47.0°C  (high = +82.0°C, crit = +100.0°C)
Core 1:        +49.0°C  (high = +82.0°C, crit = +100.0°C)
Core 2:        +46.0°C  (high = +82.0°C, crit = +100.0°C)
Core 3:        +47.0°C  (high = +82.0°C, crit = +100.0°C)
Core 4:        +47.0°C  (high = +82.0°C, crit = +100.0°C)
Core 5:        +47.0°C  (high = +82.0°C, crit = +100.0°C)
Core 6:        +46.0°C  (high = +82.0°C, crit = +100.0°C)
Core 7:        +47.0°C  (high = +82.0°C, crit = +100.0°C)
Core 8:        +48.0°C  (high = +82.0°C, crit = +100.0°C)
Core 9:        +47.0°C  (high = +82.0°C, crit = +100.0°C)

pch_cometlake-virtual-0
Adapter: Virtual device
temp1:        +69.0°C  

nvme-pci-4300
Adapter: PCI adapter
Composite:    +51.9°C  (low  = -273.1°C, high = +81.8°C)
                       (crit = +84.8°C)
Sensor 1:     +51.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +51.9°C  (low  = -273.1°C, high = +65261.8°C)

BAT0-acpi-0
Adapter: ACPI interface
in0:          16.16 V  
curr1:         0.00 A  

iwlwifi_1-virtual-0
Adapter: Virtual device
temp1:        +59.0°C  

nvme-pci-0700
Adapter: PCI adapter
Composite:    +54.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +54.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +48.9°C  (low  = -273.1°C, high = +65261.8°C)

nvme-pci-0600
Adapter: PCI adapter
Composite:    +58.9°C  (low  = -273.1°C, high = +84.8°C)
                       (crit = +84.8°C)
Sensor 1:     +58.9°C  (low  = -273.1°C, high = +65261.8°C)
Sensor 2:     +48.9°C  (low  = -273.1°C, high = +65261.8°C)

acpitz-acpi-0
Adapter: ACPI interface
temp1:        +49.0°C  

Indeed, it was always the most likely.

A slower system to get used to, maybe, but you will get used to it over time. Slow is better than crashing; at least this allows a level of confidence in the machine you haven’t had for a while. Perhaps you can rename the profile to “Slow and Breezy”. :slight_smile:

Regarding the Samsung 980 Pro;

The firmware might go some way toward allowing you to regain some of that performance loss; if what Tuxedo Support tells you, is accurate.

It is also possible that only the range of drives specific to the firmware will be detected (a guess). Make sure you have the exact firmware for your drive. There is an ISO on the Samsung site listed under Samsung Storage Firmware – NVMe SSD-980 PRO Series Firmware – ISO 5B2QGXA7 | 27MB that appears to be the one. Is this the same as you downloaded previously?

Related:- I found this article at Toms’Hardware Samsung Issues Fix for Dying 980 Pro SSDs, which is disconcerting in itself.

However, it also states:

It should be noted that 980 Pro SSDs running the 4B2QGXA7 or 5B2QGXA7 firmware are not affected by this issue.

So, the 5B2QGXA7 previously linked seems to be the correct one, if it isn’t already applied.

Samsung also recommends that it be installed using Samsung Magician Software; only Windows, macOS and Android versions are mentioned. Here is the Installation Guide (for Windows).

Of course, these are not much use without Windows or macOS.

It might be possible to use the Samsung tool via a Windows PE boot disk; such as the Hiren’s BootCD PE (Preinstallation Environment). However, this is likely outside the scope of the Manjaro forum.

As overheating is now recognised at the culprit, it’s probably time to mark this thread as solved. Check the tick under whichever post you feel initially led you to this conclusion.

I’m glad we could help. Cheers.

1 Like

He he. A great idea! Yes, it is far better having a system that works and is stable, than one frequently shutting down.

That is the one I tried!

I definitively need to get that firmware updated. Thanks for the Hiren’s BootCD PE tip. I hope I will manage to upgrade, one way or another, before my SSD shuts down.

I will do so.

Thanks for amazing help, from you and other members of this forum.

Cheers!

2 Likes

This topic was automatically closed 3 hours after the last reply. New replies are no longer allowed.