Machines with AMD CPUs Ryzen 5 1600 and Ryzen 7 1700 crash regularly

Just now I got a freeze while scrolling down on the Manjaro forum.
Here is the put put of journalctl -b -1
https://0x0.st/-hjq.txt

Have you been able to use these in the past without issues? Or are these new computers?

The freezing is a known issue for 1st gen Ryzen CPUs on Linux. It is an issue with the C-State if I recall correctly. Supposedly, for some people, their BIOS/UEFI updates eventually fixed it. Though… for others like me:

I have a Ryzen 7 1700x and to make it work without having the computer freeze, I had to go into my UEFI → AdvancedAMD CBSPower Supply Idle Control → Change to Typical Current Idle

Now, not every motherboard has these exact steps to find this. You’ll have to look for it. Regarding your laptop though… hopefully it has an unlocked enough UEFI to change it.

2 Likes

The machines were new to me and I got those issues from the start.

I wasn’t able to find any AMD CBS settings in the UEFI firmware.

Sound Server-4: PulseAudio v: 15.0 running: yes
Sound Server-5: PipeWire v: 0.3.40 running: yes

You use Pipewire AND Pulseaudio
Use Pipewire OR Pulseaudio !!!

Swap:      Alert: No swap data was found.

Create a Swapfile

https://wiki.manjaro.org/index.php/Swap#Using_a_Swapfile

I agree on pulseaudio and pipewire.
As for swap I have 32 GB RAM.

@cscs suggested that it migh have to do with IOMMU

sudo dmesg | grep -i -E 'fail|error|iommu|amd' | less
https://0x0.st/-hjw.json

alternative link
https://www.toptal.com/developers/hastebin/fevutasobu.yaml

That is a question that cannot be answered without having a look at your configuration and even with plenty of available memory, it is often used as a safety net or even sometimes due to specific application requirements so have a look at the following non-exhaustive list:

If you use hibernation: yes, you need swap!
If you have services that are not always active, but are still running all the time: yes, you need swap!
If you have an application that allocates virtual memory directly for temporary storage instead of RAM: yes, you need swap!
If you have an application that has a memory leak: yes, you need swap!
If you have a server with 1TB of RAM that you're using as a desktop without applications allocating virtual memory or having memory leaks: No, you don't need swap!

The Link is broken

I’ll try with swap. But I ran the same setup on an Intel 9th Generation laptop and had no such issues.

Is this ok so ?

sorry, the link works for me
try the alternative link I posted.

The temperatures, I don’t know.

That’s the default on KDE. Pipewire is a dependency of KDE desktop.

Some dmesg checking might be helpful. Maybe something like this:

sudo dmesg | grep -i -E 'fail|error|iommu|amd'

But taking a guess, I might say check on iommu, and the old rcu_nocbs workaround for ‘soft lockups’ as seen in the past. idle=nomwait may be applicable as well.

( https://bugzilla.kernel.org/show_bug.cgi?id=196683 )

Note: The rcu_nocbs option is dependent on threads not cores. One way to show total:
sudo dmidecode -t 4 | grep 'Thread Count'

These are all boot parameters, so apply them in /etc/default/grub, run sudo update-grub and reboot.
In this example I will assume you have 16 threads.

iommu=pt rcu_nocbs=0-15 idle=nomwait

Just throwing those out there.

( PS @eugen-b I dont think 0x0 likes being sent a ‘less’ output … I wil modify suggestion. )
( PPS oh I guess you can change tabs to RAW … never seen it before :stuck_out_tongue: )

1 Like

eventual ?

https://patchwork.kernel.org/project/platform-driver-x86/patch/CADtzkx7TdfbwtaVEXUdD6YXPey52E-nZVQNs+Z41DTx7gqMqtw@mail.gmail.com/

I did notice

[    0.000000] tsc: Fast TSC calibration failed
[   11.578305] tpm_crb: probe of MSFT0101:00 failed with error -16

( I actually have that module blacklisted so :woman_shrugging: )

[   12.269443] iommu ivhd0: AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=00:00.0 pasid=0x00000 address=0xfffffffdf8000000 flags=0x0a00]

So those first and last ones may be symptomatic, but I assume it is before testing any of the above options.
The first one definitely reminds me of when I first got a new ryzen machine. I cant quite tell if the 3rd one also matches an old apic issue.

After reboot the dmesg output is like this: hastebin

Not a bad idea to install linux516 which is rc3 currently on Manjaro.

@cscs here is the dmesg result on linux516 hastebin
Mostly identical.

Looks like you lost a few things from pt originally…

[    0.227600] pci 0000:03:00.2: Adding to iommu group 12
[    0.227616] pci 0000:04:05.0: Adding to iommu group 12
[    0.227620] pci 0000:04:06.0: Adding to iommu group 12
[    0.227623] pci 0000:04:07.0: Adding to iommu group 12
[    0.227627] pci 0000:04:08.0: Adding to iommu group 12

Then with 5.16 you got the elan error:

[   10.780182] i2c_hid_acpi: probe of i2c-ELAN1200:00 failed with error -22

Hey folks, I should have checked the Arch Wiki
https://wiki.archlinux.org/title/Ryzen
It mentions random reboots and soft lock freezes.

The CPU ID and the Processor number may vary. To solve this problem you need to supply higher voltage to your CPU so that it is stable when running at peak frequencies. The easiest way to achieve this is to use the AMD curve optimiser which is accessible via your motherboard’s bios. Access it and put a positive offset of 4 points, which will increase the voltage your CPU is getting at higher loads. It will limit overclocking potential due to higher heat dissipation requirements, but it will run stable. For more details check this forum post. When I did this for my 5950X, my processor stabilised and the frequency and voltage ranges were more similar to those observed under windows.

1 Like