Hey! Guys! I have an Asus Tuf A 15 laptop (FA506NC, the 2021 model)
Everything works, except that once in a while, (I’m starting to think that after an exact number of uptime hours) mabye 6-7 uptime hours, it blackscreens and restarts by itself with no logs.
I’ve checked journalctl -b -1 and journalctl -b 0 and there is nothing related to an issue.
I have an and ryzen 5 and nvidia rtx 3050 with proprietary drivers
I’ve also previously added the acpi_osi=! idle=nomwait acpi_backlight=native grub parameters in order to get my brithness control to work.
The random rebooting with no logs is a fairly new issue or perhaps not.
There is no reasonable explanation other than some had an undervolting issue for their cpu.
It is suggested that undervoltning may be causing the issue when the workload increases and the power to the processor is not increased to match the workload and the affected systems is usually systems equipped with AMD APU.
It is suggested that Windows has a way of dealing with this - one comment suggested that Windows was ignoring a limit set in the firmware - and Linux kernel does not.
This results in reboots - seemingly random and with no explicit cause.
You will have to talk to the vendor - they will be able to advise what to do - as this is not specifically a kernel issue or software ditto - but the way the firmware is configured to react to changes in workloads.
Please search the forum for similar issues - I am sure you find what I am referring to - I don’t remember exact which thread - there has only been - perhaps a handful.
I have no idea … I have no hands-on with such systems - so any advise from me on the matter would be bad - I can only lead you in the direction - what to do - it is up to you.
Carefully examine the topics - think, think again, more important - understand what you are doing, before you apply something you later regret.
The safest path is to use your vendors support channels.
I have a ThinkPad x13 AMD gen4 - with 7840u APU - but a Lenovo is not the same as an Asus - the firmware is different.
I’ve started running a stress test (stess-ng) And it’s been running at 100% and about 70 deg for 10 minutes…
So could it still be an undervolt problem?
Btw, when it crashed, (today and 2 days ago) I was just writing some code with just nvim and a browser open, the cpu was at mabye 3-4 percent, so nothing computationally expensive…
And I’ve checked the ram, it seems to be fine.
It could help, yeah (But Hardware is complex). I had exactly 20year’s ago a problem with my single core which was 6 year’s old at this time and it rebooted randomly in CPU heavy tasks, while the CPU was still good cooled… after painfull year for bug hunting and infinity system reinstall!
I came to the conclusion to mess around with the vcore settings and increased it a little, the system was stable after that.
The same could probably related to your RAM, the silicon lottery + default bios settings not always fits to all the Hardware… specially when the Hardware gets older it can (not must) required (only a little) more voltage.
But that said, if you have the option to improve cooling (like refreshing thermalpaste: i can recommend Grizzly), the voltage increase is not always needed.
The better the cooling the lower the voltage that the hardware required to run stable.
And you can see this with Watercooled systems, where some user’s can really hard undervolt their CPU/GPU.
Sadly, my bios doesn’t have that option. It happened again, after 8.5 hours of uptime. Still no logs, no dmesg shows nothing that would indicate cpu voltage problems (like the arch wiki says ryzen 5 reports)…
Under a stress test, about half an hour before the crash it performed totally fine…
And I only had a browser open…