Machines with AMD CPUs Ryzen 5 1600 and Ryzen 7 1700 crash regularly

I have a tower PC with a Ryzen 5 1600 CPU and a Laptop with a Ryzen 7 1700.
Both of them show the same behaviour.
They tend to

  • freeze when using the browser
  • reboot when using the browser, mostly when I open Save As or Upload window and start to navigate folders
    – when rebooting I get a boot message with keywords mce, microcode, cpu id

My browser is ungoogled-chromium. I didn’t have the time to test if this behaviour happens with other browsers.
I also have profile-sync-daemon running.

Somewhere I found the recommendation to set
/etc/sysctl.d/40-max-user-watches.conf
to
fs.inotify.max_user_watches=524288
But that didn’t change anything.

Currently I cannot access the journal of -b -2. After a reboot following hte crash I will document the journal log and port it here.

~ >>> inxi -Fxxxz                                                                                               [1]
System:    Kernel: 5.15.5-2-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0 Desktop: KDE Plasma 5.23.3
           tk: Qt 5.15.2 wm: kwin_x11 vt: 1 dm: SDDM Distro: Manjaro Linux base: Arch Linux
Machine:   Type: Laptop System: ASUSTeK product: GL702ZC v: 1.0 serial: <superuser required>
           Mobo: ASUSTeK model: GL702ZC v: 1.0 serial: <superuser required> UEFI: American Megatrends
           v: GL702ZC.306 date: 07/05/2019
Battery:   ID-1: BAT0 charge: 61.6 Wh (98.7%) condition: 62.4/74.2 Wh (84.1%) volts: 15.4 min: 15.4
           model: ASUSTeK ASUS Battery type: Li-ion serial: N/A status: Not charging cycles: 104
CPU:       Info: 8-Core model: AMD Ryzen 7 1700 bits: 64 type: MT MCP arch: Zen rev: 1 cache: L1: 768 KiB
           L2: 4 MiB L3: 16 MiB
           flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 95848
           Speed: 2089 MHz min/max: 1550/3000 MHz boost: enabled Core speeds (MHz): 1: 1827 2: 2572 3: 3272
           4: 2616 5: 1361 6: 1291 7: 1311 8: 1279 9: 1359 10: 1286 11: 1358 12: 1321 13: 1341 14: 1326 15: 1275
           16: 1276
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590]
           vendor: ASUSTeK driver: amdgpu v: kernel bus-ID: 0c:00.0 chip-ID: 1002:67df class-ID: 0300
           Device-2: Realtek USB2.0 HD UVC WebCam type: USB driver: uvcvideo bus-ID: 1-8:4 chip-ID: 0bda:57fa
           class-ID: 0e02 serial: <filter>
           Display: x11 server: X.Org 1.21.1.1 compositor: kwin_x11 driver: loaded: amdgpu,ati
           unloaded: modesetting alternate: fbdev,vesa resolution: 1: 1920x1080~60Hz 2: 1920x1080 s-dpi: 96
           OpenGL: renderer: AMD Radeon RX 580 Series (POLARIS10 DRM 3.42.0 5.15.5-2-MANJARO LLVM 13.0.0)
           v: 4.6 Mesa 21.2.5 direct render: Yes
Audio:     Device-1: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] vendor: ASUSTeK
           driver: snd_hda_intel v: kernel bus-ID: 0c:00.1 chip-ID: 1002:aaf0 class-ID: 0403
           Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: ASUSTeK driver: snd_hda_intel
           v: kernel bus-ID: 12:00.3 chip-ID: 1022:1457 class-ID: 0403
           Sound Server-1: ALSA v: k5.15.5-2-MANJARO running: yes
           Sound Server-2: sndio v: N/A running: no
           Sound Server-3: JACK v: 1.9.19 running: no
           Sound Server-4: PulseAudio v: 15.0 running: yes
           Sound Server-5: PipeWire v: 0.3.40 running: yes
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: ASUSTeK driver: r8169
           v: kernel port: e000 bus-ID: 06:00.0 chip-ID: 10ec:8168 class-ID: 0200
           IF: enp6s0 state: up speed: 100 Mbps duplex: full mac: <filter>
           Device-2: Realtek RTL8822BE 802.11a/b/g/n/ac WiFi adapter vendor: AzureWave driver: rtw_8822be v: N/A
           port: d000 bus-ID: 07:00.0 chip-ID: 10ec:b822 class-ID: 0280
           IF: wlp7s0 state: up mac: <filter>
           IF-ID-1: virbr0 state: down mac: <filter>
Bluetooth: Device-1: IMC Networks Bluetooth Radio type: USB driver: btusb v: 0.8 bus-ID: 1-10:6
           chip-ID: 13d3:3526 class-ID: e001 serial: <filter>
           Report: rfkill ID: hci0 rfk-id: 1 state: down bt-service: enabled,running rfk-block: hardware: no
           software: yes address: see --recommends
Drives:    Local Storage: total: 953.87 GiB used: 337.6 GiB (35.4%)
           ID-1: /dev/nvme0n1 vendor: Intel model: SSDPEKNW010T8 size: 953.87 GiB speed: 31.6 Gb/s lanes: 4
           type: SSD serial: <filter> rev: 002C temp: 28.9 C scheme: GPT
Partition: ID-1: / size: 183.31 GiB used: 26.22 GiB (14.3%) fs: f2fs dev: /dev/nvme0n1p2
           ID-2: /boot/efi size: 98.4 MiB used: 141 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p5
           ID-3: /home size: 468.73 GiB used: 215.5 GiB (46.0%) fs: f2fs dev: /dev/dm-0 mapped: crypto_LUKS
Swap:      Alert: No swap data was found.
Sensors:   System Temperatures: cpu: 52.0 C mobo: N/A gpu: amdgpu temp: 51.0 C
           Fan Speeds (RPM): cpu: 2300
Info:      Processes: 384 Uptime: 2h 48m wakeups: 1 Memory: 31.35 GiB used: 3.07 GiB (9.8%) Init: systemd v: 249
           Compilers: gcc: 11.1.0 clang: 13.0.0 Packages: pacman: 1438 Shell: Zsh v: 5.8 running-in: konsole
           inxi: 3.3.09

Hi!!
That’s my combo: AMD Ryzen 3 1200 + Gigabyte B450. I was suffering espontaneous reboots, really annoying and dangerous during and update.

Enabling AMD IBS in the bios>Peripherals>AMD CBS fixed it.
Maybe it’s related or not with your issue, but it’s an idea…

Regards!!

1 Like

I didn’t find that in my BIOS.
But I found something different, a setting called SVM Mode with the info “Enable CPU virtualisation feature.” It was disabled and I’m using a Qemu/KVM. So I will observe if it helped now to enable that feature.
Thanks for a possibly useful hint in the right direction!

2 Likes

I’ve realized that now this setting has been moved a little:
Peripherals>AMD CBS>CPU Common Options

:grinning: :grinning:

No, I can’t find any IBS (Instruction-Based Sampling) in the BIOS, there is no mention in the notebook manual neither.

1 Like

Just now I got a freeze while scrolling down on the Manjaro forum.
Here is the put put of journalctl -b -1
https://0x0.st/-hjq.txt

Have you been able to use these in the past without issues? Or are these new computers?

The freezing is a known issue for 1st gen Ryzen CPUs on Linux. It is an issue with the C-State if I recall correctly. Supposedly, for some people, their BIOS/UEFI updates eventually fixed it. Though… for others like me:

I have a Ryzen 7 1700x and to make it work without having the computer freeze, I had to go into my UEFI → AdvancedAMD CBSPower Supply Idle Control → Change to Typical Current Idle

Now, not every motherboard has these exact steps to find this. You’ll have to look for it. Regarding your laptop though… hopefully it has an unlocked enough UEFI to change it.

2 Likes

The machines were new to me and I got those issues from the start.

I wasn’t able to find any AMD CBS settings in the UEFI firmware.

Sound Server-4: PulseAudio v: 15.0 running: yes
Sound Server-5: PipeWire v: 0.3.40 running: yes

You use Pipewire AND Pulseaudio
Use Pipewire OR Pulseaudio !!!

Swap:      Alert: No swap data was found.

Create a Swapfile

https://wiki.manjaro.org/index.php/Swap#Using_a_Swapfile

I agree on pulseaudio and pipewire.
As for swap I have 32 GB RAM.

@cscs suggested that it migh have to do with IOMMU

sudo dmesg | grep -i -E 'fail|error|iommu|amd' | less
https://0x0.st/-hjw.json

alternative link
https://www.toptal.com/developers/hastebin/fevutasobu.yaml

That is a question that cannot be answered without having a look at your configuration and even with plenty of available memory, it is often used as a safety net or even sometimes due to specific application requirements so have a look at the following non-exhaustive list:

If you use hibernation: yes, you need swap!
If you have services that are not always active, but are still running all the time: yes, you need swap!
If you have an application that allocates virtual memory directly for temporary storage instead of RAM: yes, you need swap!
If you have an application that has a memory leak: yes, you need swap!
If you have a server with 1TB of RAM that you're using as a desktop without applications allocating virtual memory or having memory leaks: No, you don't need swap!

The Link is broken

I’ll try with swap. But I ran the same setup on an Intel 9th Generation laptop and had no such issues.

Is this ok so ?

sorry, the link works for me
try the alternative link I posted.

The temperatures, I don’t know.

That’s the default on KDE. Pipewire is a dependency of KDE desktop.

Some dmesg checking might be helpful. Maybe something like this:

sudo dmesg | grep -i -E 'fail|error|iommu|amd'

But taking a guess, I might say check on iommu, and the old rcu_nocbs workaround for ‘soft lockups’ as seen in the past. idle=nomwait may be applicable as well.

( https://bugzilla.kernel.org/show_bug.cgi?id=196683 )

Note: The rcu_nocbs option is dependent on threads not cores. One way to show total:
sudo dmidecode -t 4 | grep 'Thread Count'

These are all boot parameters, so apply them in /etc/default/grub, run sudo update-grub and reboot.
In this example I will assume you have 16 threads.

iommu=pt rcu_nocbs=0-15 idle=nomwait

Just throwing those out there.

( PS @eugen-b I dont think 0x0 likes being sent a ‘less’ output … I wil modify suggestion. )
( PPS oh I guess you can change tabs to RAW … never seen it before :stuck_out_tongue: )

1 Like

eventual ?

https://patchwork.kernel.org/project/platform-driver-x86/patch/CADtzkx7TdfbwtaVEXUdD6YXPey52E-nZVQNs+Z41DTx7gqMqtw@mail.gmail.com/

I did notice

[    0.000000] tsc: Fast TSC calibration failed
[   11.578305] tpm_crb: probe of MSFT0101:00 failed with error -16

( I actually have that module blacklisted so :woman_shrugging: )

[   12.269443] iommu ivhd0: AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=00:00.0 pasid=0x00000 address=0xfffffffdf8000000 flags=0x0a00]

So those first and last ones may be symptomatic, but I assume it is before testing any of the above options.
The first one definitely reminds me of when I first got a new ryzen machine. I cant quite tell if the 3rd one also matches an old apic issue.