Working out the cause of system freezes

My system freezes, sometimes a few times a day. There is no obvious pattern that I can see - sometimes the system will freeze while I’m using it, sometimes while I’m AFK, sometimes while the system has been suspended. When the system freezes, usually the keyboard and mouse are no longer powered up, the machine is no longer pingable on the network.
I have been checking journalctl -b -1 after these freezes recently, but there is nothing obvious to me there.

I have had some hassles with the PCIe wifi adapter in the past and, following some advice,

  • blacklisted r8168 kernel module
  • loaded r8169 kernel module

Despite this change I do sometimes get erratic network connectivity which sudo systemctl restart NetworkManager sometimes resolves.

I have mostly kept up-to-date on kernel version and mobo BIOS.

I generally run the following programs:

  • i3wm
  • firefox (heavily)
  • alacritty (many instances)

I do sometimes hear a very feint clicking sound (~0.5Hz) when the system has frozen, like there is a relay or a solenoid activating, or a fan spinning up and then down again.

Please let me know if you have any hints that can help me try to find the culprit. What can I monitor? Which logs should I be checking?

inxi:

System:    Kernel: 5.14.18-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0 Desktop: i3 4.20.1 info: i3bar dm: LightDM 1.30.0
           Distro: Manjaro Linux base: Arch Linux
Machine:   Type: Desktop System: Gigabyte product: B450M DS3H v: N/A serial: N/A
           Mobo: Gigabyte model: B450M DS3H-CF v: x.x serial: N/A UEFI: American Megatrends v: F50 date: 11/27/2019
CPU:       Info: 8-Core model: AMD Ryzen 7 2700 bits: 64 type: MT MCP arch: Zen+ rev: 2 cache: L1: 768 KiB L2: 4 MiB
           L3: 16 MiB
           flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 102227
           Speed: 2454 MHz min/max: 1550/3200 MHz boost: enabled volts: 1.0 V ext-clock: 100 MHz Core speeds (MHz): 1: 2962
           2: 1535 3: 1424 4: 1482 5: 1619 6: 1381 7: 1379 8: 1321 9: 1486 10: 1469 11: 1539 12: 1398 13: 1269 14: 1338
           15: 2769 16: 1612
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] vendor: Gigabyte
           driver: amdgpu v: kernel bus-ID: 07:00.0 chip-ID: 1002:67df class-ID: 0300
           Device-2: Logitech Webcam C170 type: USB driver: snd-usb-audio,uvcvideo bus-ID: 3-4:5 chip-ID: 046d:082b
           class-ID: 0102
           Display: server: X.Org 1.21.1.1 compositor: picom v: git-dac85 driver: loaded: amdgpu,ati unloaded: modesetting
           alternate: fbdev,vesa resolution: 3840x2160~60Hz s-dpi: 96
           OpenGL: renderer: AMD Radeon RX 570 Series (POLARIS10 DRM 3.42.0 5.14.18-1-MANJARO LLVM 13.0.0) v: 4.6 Mesa 21.2.5
           direct render: Yes
Audio:     Device-1: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] vendor: Gigabyte driver: snd_hda_intel
           v: kernel bus-ID: 07:00.1 chip-ID: 1002:aaf0 class-ID: 0403
           Device-2: Advanced Micro Devices [AMD] Family 17h HD Audio vendor: Gigabyte driver: snd_hda_intel v: kernel
           bus-ID: 09:00.3 chip-ID: 1022:1457 class-ID: 0403
           Device-3: Asahi Kasei Microsystems AK5370 I/F A/D Converter type: USB driver: snd-usb-audio bus-ID: 3-2:3
           chip-ID: 0556:0001 class-ID: 0102
           Device-4: Logitech Webcam C170 type: USB driver: snd-usb-audio,uvcvideo bus-ID: 3-4:5 chip-ID: 046d:082b
           class-ID: 0102
           Sound Server-1: ALSA v: k5.14.18-1-MANJARO running: yes
           Sound Server-2: JACK v: 1.9.19 running: no
           Sound Server-3: PulseAudio v: 15.0 running: yes
           Sound Server-4: PipeWire v: 0.3.40 running: no
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Gigabyte driver: r8169 v: kernel
           port: e000 bus-ID: 05:00.0 chip-ID: 10ec:8168 class-ID: 0200
           IF: enp5s0 state: down mac: <filter>
           Device-2: Realtek RTL8192CE PCIe Wireless Network Adapter vendor: ASUSTeK driver: rtl8192ce v: kernel port: d000
           bus-ID: 06:00.0 chip-ID: 10ec:8178 class-ID: 0280
           IF: wlp6s0 state: up mac: <filter>
Bluetooth: Device-1: Cambridge Silicon Radio Bluetooth Dongle (HCI mode) type: USB driver: btusb v: 0.8 bus-ID: 3-3:4
           chip-ID: 0a12:0001 class-ID: e001
           Report: rfkill ID: hci0 rfk-id: 1 state: up address: see --recommends
Drives:    Local Storage: total: 342.81 GiB used: 156.07 GiB (45.5%)
           ID-1: /dev/nvme0n1 vendor: Gigabyte model: GP-GSM2NE8128GNTD size: 119.24 GiB speed: 15.8 Gb/s lanes: 2 type: SSD
           serial: <filter> rev: E8TM14.2 temp: 40.9 C scheme: GPT
           ID-2: /dev/sda vendor: Crucial model: CT240BX500SSD1 size: 223.57 GiB speed: 6.0 Gb/s type: SSD serial: <filter>
           rev: R013 scheme: GPT
Partition: ID-1: / size: 116.58 GiB used: 98.14 GiB (84.2%) fs: ext4 dev: /dev/nvme0n1p2
           ID-2: /boot/efi size: 299.4 MiB used: 7.9 MiB (2.6%) fs: vfat dev: /dev/nvme0n1p1
Swap:      Alert: No swap data was found.
Sensors:   System Temperatures: cpu: 50.9 C mobo: 16.8 C gpu: amdgpu temp: 51.0 C
           Fan Speeds (RPM): N/A gpu: amdgpu fan: 708
Info:      Processes: 305 Uptime: 10m wakeups: 0 Memory: 31.3 GiB used: 2.13 GiB (6.8%) Init: systemd v: 249 Compilers:
           gcc: 11.1.0 clang: 13.0.0 Packages: pacman: 1467 Shell: Zsh (sudo) v: 5.8 default: Bash v: 5.1.8
           running-in: alacritty inxi: 3.3.09

Welcome to the forum!

Everything you’ve stated so far suggests that it would be a hardware problem, and then the usual culprit ─ I’m not saying this is the case, but it’s worth checking ─ is a faulty memory module.

Have you run a memory test yet? If not, I recommend doing so and leaving it running for about 24 hours.

Hi @Aragorn

Thank you for your reply. I haven’t run a memory test, but I had suspected that the issue might be memory related. I have quite a lot (32GB), but not swap. I will often have firefox hungrily consuming most of what is there and I think I have noticed my system freezing when using firefox more often than not, although this could be a total red herring - I use firefox a lot.

I will run some memory tests overnight. I plan to do as suggested at :mag_right: wiki archlinux Stressing_memory.

Thank you for the tip.

1 Like

I though running a memory test was going to be straightforward, but turns out not to be the case.

I downloaded the bootable ISO and the bootable binary for memtest86+ v5.31b from :mag_right:memtest org #downiso” and tried writing those to a USB, but I couldn’t get either to boot on my desktop or laptop.

I tried running the GRUB commandline, suspecting that memtest might be built in, but it’s not clear to me what to run.

It’s also surprisingly unclear from the searching I’ve done how to go about this… Will keep looking and post back here if I find anything.

memtest86+ appears to be installed on my system and configured in GRUB,

$ ls -la /boot/memtest86+/memtest.bin
-rw-r--r-- 1 root root 153868 Jun 12 11:20 /boot/memtest86+/memtest.bin

$ ls -la /etc/grub.d/60_memtest86+
-rwxr-xr-x 1 root root 1219 Jun 12 11:20 /etc/grub.d/60_memtest86+

$ sudo grep -C2 memtest /boot/grub/grub.cfg
### END /etc/grub.d/41_custom ###

### BEGIN /etc/grub.d/60_memtest86+ ###
if [ "${grub_platform}" == "pc" ]; then
    menuentry "Memory Tester (memtest86+)" --class memtest86 --class gnu --class tool {
        search --fs-uuid --no-floppy --set=root  38ef1b6f-5c42-484f-8dc7-577a35dcff68
        linux16 /boot/memtest86+/memtest.bin
    }
fi
### END /etc/grub.d/60_memtest86+ ###

but the Memory Tester (memtest86+) menu doesn’t show up.

I’m aware that memtest86+ is a 16-bit program (hence needing to be run using linux16, I assume) and that it cannot be run in UEFI mode. My system is configured for legacy boot mode, as far as I can tell (BIOS settings CSM → “legacy only”).

I notice the conditional if [ "${grub_platform}" == "pc" ] in /boot/grub/grub.cfg, which must be what is preventing the menu item from showing up. According to the docos (:mag_right:gnu org grub_005fplatform”)

15.1.16 grub_platform

In normal mode (see normal), GRUB sets the ‘grub_platform’ variable to the platform for which GRUB was built (e.g. ‘pc’ or ‘efi’). 

So, I guess grub_platform == "efi" in my case, which probably means I am booting in UEFI mode.

Yes, that much is clear, because otherwise the GRUB menu would show the memtest86 entry. Also, your inxi output shows that you have an EFI system partition.

You’ll need to make sure that the UEFI boots in legacy mode only for the USB drive. There may be an additional setting for that.

1 Like

Ryzen? It’s a common issue. See here :

A firmware downgrade may resolve your issue.

@Aragorn There is no obvious setting in my BIOS that would allow me to boot from USB in legacy mode, but I’ll do some more digging. Thanks for the help.

@lupo2010 Seems I’m not the only one. Thanks for the link. After some reading I have disabled Firefox hardware acceleration and I have optimistically upgraded my kernel to 5.15.2-2-MANJARO. I’m not expecting any magic here, but perhaps a little more stability. I noticed that the other posters found amdgpu error messages in system logs, where I don’t.

I upgraded my kernel from 5.14.18-1 to 5.15.2-2 (corrected some versions there) and disabled Firefox hardware acceleration. I have not had a freeze since then and I have also not had the wifi issues either (possibly linked, possibly not).

Wow, that’s some time travel technology! Over here in 2021, the highest available major version for stable kernels is still 5.15. :stuck_out_tongue:

Anyway, I spoke too soon. I just had a freeze then…

Updating BIOS might resolve system hardware issues
B450M DS3H (rev. 1.x) Support | Motherboard - GIGABYTE Global

The future is now!

Upgraded the BIOS and now I’m not able to boot Manjaro. I have Ubuntu on another partition which I am able to boot. I do see my Manjaro installation listed in Ubuntu’s grub menu, but when I select it, the system will not load (the system just stalls).