Over the last two years, I’ve been struggling with constant full system freezes and random application crashes on Manjaro Linux. These crashes often show segfault at
or trap invalid opcode
messages or similar in dmesg, and they happen across multiple different hardware configurations. Nothing I try seems to resolve the problem, and I’m starting to wonder if there’s something inherently unstable about Manjaro, or if I’m missing something crucial in my setup.
Going into details here:
I used to run my primary gaming system with Windows 10 installed on a self-built setup featuring a Ryzen 3900X and a GTX 980 Ti GPU, alongside some 4x Patriot 32GB memory and a random Samsung NVMe SSD. This system was rock solid for years—even after I installed Proxmox on it and ran Windows 10 in a VM for a few months—until I upgraded to a newer generation system and retired the old Ryzen 3900X rig. That might not sound relevant to my current problems, but bear with me.
I initially tried installing Manjaro (I don’t remember the specific Manjaro version, kernel, KDE version, etc.) on an MSI GS60 6QE Ghost Pro laptop that had previously run openSUSE Tumbleweed with the i3 window manager without a hitch. I wanted to switch to Manjaro because certain applications didn’t support tiling window managers well, forcing me to use a bunch of workarounds. Everything seemed fine at first, until the laptop had its first crash a few days later. Because of its age and near 24/7 operation for seven years, I assumed the hardware might just be failing. I brushed off the issues as bad/aged hardware and decided to buy a new system.
The new system was an ASUS ExpertCenter PN53 barebones, equipped with a WD_BLACK SN850X 1000GB NVMe SSD, a Samsung SSD 990 PRO 1TB NVMe SSD, and 2x Kingston 32GB 2Rx6 PC5-4800B memory DIMMs (listed by ASUS as supported). I installed Manjaro (encrypted ext4) on it, but almost immediately ran into system lockups with random AMD GPU–related kernel oopses, kernel panics, slowdowns, etc. For example:
x86/PAT: device poll:2932903 conflicting memory types ee803000-ee804000 uncached-minus<->broken
x86/PAT: memtype_reserve failed [mem 0xee803000-0xee803fff], track uncached-minus, req write-back
ioremap memtype_reserve failed -16
ACPI Error: Could not map memory at 0x00000000EE803004, size 4 (20230331/exregion-166)
ACPI Error: AE_NO_MEMORY, Returned by Handler for [SystemMemory] (20230331/evregion-300)
ACPI Error: Aborting method \_SB.PCI0.GP19.XHC3.RPRM due to previous error (AE_NO_MEMORY) (20230331/psparse-529)
ACPI Error: Aborting method \_SB.PCI0.GP19.XHC3._REG due to previous error (AE_NO_MEMORY) (20230331/psparse-529)
[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=72352, emitted seq=72354
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0
amdgpu 0000:e7:00.0: amdgpu: GPU recovery disabled.
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1439702, emitted seq=1439704
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 986 thread Xorg:cs0 pid 987
amdgpu 0000:e7:00.0: amdgpu: GPU recovery disabled.
[drm] DP Alt mode state on HPD: 1
------------[ cut here ]------------
WARNING: CPU: 6 PID: 825 at drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:2740 decide_link_settings+0x1c1/0x1d0 [amdgpu]
[...]
RIP: 0010:decide_link_settings+0x1c1/0x1d0 [amdgpu]
[...]
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
[...]
RIP: 0010:perform_link_training_with_retries+0xb1/0x220 [amdgpu]
After some tinkering in the BIOS (increasing dedicated VRAM, etc.), I managed to get it somewhat stable—at least enough to run for a few days before it froze again. This was frustrating because Windows 10 had worked flawlessly on the same system for two weeks straight when I stress-tested it. I chalked it up to a “quirky hardware/Manjaro compatibility issue” and tried to live with the crashes.
My third system, a “Clevo X370SNV-G” laptop OEM-branded as “Xidax XM-10 X370SNx,” came with another batch of odd issues. Right out of the box, I disabled the NVIDIA GPU to save power, which I later realized disabled the external monitor outputs. After installing Manjaro (encrypted btrfs) and learning that the external monitor was being forced through the NVIDIA GPU, I noticed the external display ran at a noticeably laggy framerate, while the internal screen was perfectly smooth. Someone on Discord suggested using a Thunderbolt/USB-C to HDMI adapter instead of the built-in HDMI/DisplayPort ports, and that solved the display performance issue. However, the system still randomly crashed apps and froze entirely, often generating kernel messages like:
[80053.974199] traps: chromium[85590] general protection fault ip:78ce54cd9662 sp:7ffe75522458 error:0 in libc.so.6[16c662,78ce54b91000+171000]
[87094.840647] slack[3397]: segfault at 8000010 ip 00005b752bebded8 sp 00007ffeefaf5130 error 4 in slack[9472ed8,5b7524af3000+8996000] likely on CPU 8 (core 16, socket 0)
[87094.840653] Code: ff e9 58 fb ff ff 49 63 45 58 41 be 01 00 00 00 48 85 c0 0f 84 52 fc ff ff 48 8d 15 72 65 bb 01 48 8b 12 48 c1 e0 03 48 21 d0 <48> 63 40 08 48 85 c0 0f 84 34 fc ff ff 48 8b 75 c0 48 83 c6 08 0f
[119184.181079] chromium[137116]: segfault at 7ffdaa82f4cc ip 00005b84ea0ada00 sp 00007ffdaa81fcb0 error 4 in chromium[1a71a00,5b84e987b000+c41e000] likely on CPU 0 (core 0, socket 0)
[119184.181085] Code: 5b 41 5c 41 5e 41 5f 5d c3 31 c0 eb ad 31 c0 eb d7 cc cc cc cc cc cc cc cc cc 55 48 89 e5 41 57 41 56 53 50 48 89 f3 49 89 fe <0f> b6 86 8c f5 00 00 a8 01 75 2a 49 89 1e 48 8b 83 00 02 00 00 49
[121643.733066] slack[95624]: segfault at 1951c48 ip 00005b752bebded8 sp 00007ffeefaf54a0 error 4 in slack[9472ed8,5b7524af3000+8996000] likely on CPU 8 (core 16, socket 0)
[121643.733073] Code: ff e9 58 fb ff ff 49 63 45 58 41 be 01 00 00 00 48 85 c0 0f 84 52 fc ff ff 48 8d 15 72 65 bb 01 48 8b 12 48 c1 e0 03 48 21 d0 <48> 63 40 08 48 85 c0 0f 84 34 fc ff ff 48 8b 75 c0 48 83 c6 08 0f
[128975.791398] traps: chromium[137145] trap stack segment ip:5b84ed858aec sp:1886dbe315200 error:0 in chromium[521caec,5b84e987b000+c41e000]
[134718.925336] QSGRenderThread[2612]: segfault at 5a ip 00007560b62f2df3 sp 000075608cdfe9f0 error 4 in libQt6Gui.so.6.8.1[4f2df3,7560b5edd000+68b000] likely on CPU 5 (core 8, socket 0)
[134718.925343] Code: 54 53 48 83 ec 30 4c 8b 67 08 64 48 8b 1c 25 28 00 00 00 48 89 5d e8 48 89 fb 49 8b bc 24 80 00 00 00 48 85 ff 74 5f 48 8b 07 <ff> 50 58 84 c0 74 55 e8 a1 e8 ff ff 48 39 d8 74 7c 49 8b 9c 24 80
[144865.644570] traps: slack[140271] trap invalid opcode ip:5b7528a9e7fe sp:7ffeefaf3920 error:0 in slack[60537fe,5b7524af3000+8996000]
[150332.480009] slack[176460]: segfault at 1aaaaaa10 ip 00005b752bebdf64 sp 00007ffeefaf2f10 error 4 in slack[9472f64,5b7524af3000+8996000] likely on CPU 8 (core 16, socket 0)
[150332.480016] Code: 8b 08 48 8b 45 c0 48 8b 40 08 48 3b 01 0f 84 bf fa ff ff 48 8d 0d 04 b3 a9 01 48 8b 11 48 8b 48 10 48 3b 0a 74 6f 49 8b 5d 48 <48> 3b 4b 10 74 65 41 be 01 00 00 00 41 f6 45 14 60 4c 8b 7d c0 0f
[159381.235504] chromium[189361]: segfault at 31d11d000000 ip 00005b84ea362d5b sp 00007ffdaa8207a0 error 4 in chromium[1d26d5b,5b84e987b000+c41e000] likely on CPU 25 (core 41, socket 0)
[159381.235512] Code: b4 cc cc cc cc cc cc cc cc cc cc cc cc cc cc 55 48 89 e5 41 57 41 56 41 55 41 54 53 48 83 ec 18 49 89 cf 49 81 e7 00 00 fc ff <49> 8b 07 a9 40 02 00 00 0f 85 d7 00 00 00 49 89 cd 49 89 f6 49 89
[159721.615306] ThreadPoolForeg[184178]: segfault at 2d766c59 ip 00005b75265a1375 sp 0000745c851ebba0 error 6 in slack[3b56375,5b7524af3000+8996000] likely on CPU 21 (core 37, socket 0)
[159721.615338] Code: 0f 85 d4 00 00 00 48 89 fb 49 63 44 24 08 49 29 45 18 48 39 47 10 77 61 48 89 df 48 89 c6 e8 d2 11 00 00 41 89 c6 49 8b 85 28 <01> 00 00 49 63 ce 48 8b 34 c8 49 8b 0c 24 48 8d 41 01 48 8b 56 08
Eventually, it would lock up so hard that no key combination (including Ctrl+Alt+F, REISUB, etc.) worked, forcing me to hold the power button. Logs never showed anything meaningful around the crash time. I tried older kernels (like 6.1), which maybe helped a tiny bit or perhaps that was just wishful thinking.
Now, onto my latest system. Remember the old Ryzen 3900X gaming rig? I recently put it in a new case, added an Intel Arc B580 GPU and a new SSD, but kept the original motherboard, CPU, and RAM. After a fresh Manjaro install (encrypted btrfs), graphics performance was horrible: scrolling a browser pegged both kwin_x11 and the browser at high GPU usage, and glxgears
dropped to ~24 FPS in fullscreen. Disabling compositing helped slightly at the cost of major screen tearing (still only ~40 FPS fullscreen). I figured Intel Arc might just be immature on Linux, so I ordered an RTX 3060 and called it a day. The next morning, though, the system, which still has the Arc GPU installed, had completely frozen. I booted into a Manjaro live ISO to check logs on the SSD—nothing was logged in the 60 seconds before it locked up. No hints in the journal or Xorg logs.
I’ve also tried running stress-ng
and memtester
on all these systems from a grml live environment (2–4 days each), with zero crashes. Multiple passes of memtest86 and memtest86+ show no errors. At this point, I have no idea if I’m setting up Manjaro incorrectly or if the distro is just unstable and not playing nice with any of my hardware. I’m hoping someone here can point me in the right direction: is there a known Manjaro stability issue with certain configurations, or is there something else I should be looking into? Any advice would be greatly appreciated!