For the past several weeks, using Blender on my Manjaro system has invariably led to the window manager/compositor apparently crashing and freezing.
The problem has two stages of symptoms.
Initially, everything seems to work great, but the entire display except for the mouse cursor will lock up once every couple of minutes, lasting for maybe half a minute at a time before recovering.
Inevitably, however, the display (also except for the mouse cursor) will eventually freeze for noticeably longer than those initial recoverable pauses, with more fatal results. Interestingly, during these freezes, the mouse cursor often takes on a different state— It will usually switch to the four-arrow "drag" icon for a significant portion of the freeze.
These fatal freezes kind of recover, but not really. After several minutes, the desktop and all windows and applications still remain non-interactive (including keyboard volume controls)— But the on-screen buffer will seem to flash between the two most recent successfully rendered frames. That is: The mouse still moves, but I can't interact with anything using it, and when I try, the window on top (usually Blender, but sometimes also a web browser with a dropdown or video) will seem to flash between two recently displayed frames that were successfully rendered before the crash.
F3 to switch to a different TTY almost always still works in this state. Once there, the console works as usually expected.
top does not show any noteworthy RAM or CPU consumption. Processes can be killed from there (including Blender), but killing them has no effect on the visible state of the original desktop TTY. When I switch back, the windows belonging to the applications I killed are still visible, and they still flash between the same two frames when I try to interact with the desktop
I can also usually suspend the computer using a hardware control while it is in this more fatally frozen state. When I wake it back up (and presumably force GDM/X/Mutter/GPU drivers to be reloaded), the desktop's interactivity is usually restored on a superficial level. I can unlock the screen, any applications I killed beforehand will no longer be displayed, and I can drag and resize windows, as well as launch and use simple GNOME/GTK applications.
However, even after this partial recovery, there are usually still severe graphical glitches that require a harder reset to be fixed. The unlock screen sometimes fails to render half the text characters in its dialog, scrolling in
gnome-terminal will sometimes leave pixel-high rows of dashed artifacts spaced across the screen, more complicated applications like my web browser continue to flash between two frames with no useful interactivity, and even in the best case scenario, the rendering/framerate performance in Blender is probably at least halved from what it usually is.
Restarting GNOME with "
r" in the
F2 dialog doesn't help, nor does using different versions of Blender (including official binaries with factory settings for versions that I'd previously used for months without issue), nor does enabling GPU debug flags when launching Blender.
Logging out and logging back in does fix it, however.
I originally suspected that I was running into VRAM limitations, as this started happening when I was using high-resolution textures and seemed to improve immediately after eliminating those. However, it's also happened when interacting with just the default scene, with a single cube and no textures or complex shading— after just a couple seconds of opening it, no less.
I have not noticed any anomalous system temperatures or loads associated with this. (My CPU temperature usually stays below or just over 60°C, while clock speeds and load are consistent with whatever I was doing.)
The fatal crash also seems to be inevitable even after I've closed Blender. As long as I've launched and used it at some point during the session, it's only a matter of time until everything freezes and breaks— even though everything seems to work perfectly until that time.
I have not noticed any unusual behaviour from any other OpenGL applications, nor have I noticed any erroneous behaviour that implies any individual application or underlying hardware is at fault. If it were a hardware fault, then I'd probably expect similar crashes (or at least some kind of glitch) to occur with other OpenGL applications, and I wouldn't expect the problem to be able to silently hide for a seemingly random length of time before manifesting in the same way every time. If it were an application glitch, then it shouldn't happen with the version of Blender that I've previously used just fine, and it shouldn't affect the entire desktop and all complex application windows in the exact same way.
Application windows fail to update and render, but the desktop-wide uniformity of the failure implies to me that it's due to the desktop environment failing to propagate input events and screen changes, rather than due to any application glitch. Additionally, the only specific anomalous effects that affect application content (beyond just freezing and flickering the frame buffer) are the GNOME cursor changes during the fatal freezes and the missing characters in the GDM lock screen after waking from suspension.
The issue seems to be high-level enough to not affect TTY switching and be reparable just by logging out and back in, but it seems to be low-level enough that it affects the entire desktop and all complex applications uniformly, while also being in a complex-enough component that the entire system can work perfectly for a long time between the initial trigger and the final freeze. This points me towards a problem with Manjaro's packaging of the desktop environment, compositor, window manager, and GPU drivers, or something like that.
Any idea what's happening and how to fix it?
My system details are below.
System: Host: Computer Kernel: 4.19.126-1-MANJARO x86_64 bits: 64 compiler: gcc v: 10.1.0 Desktop: Gnome 3.36.3 Distro: Manjaro Linux Machine: Type: Laptop System: Acer product: Aspire A515-51 v: V1.20 serial: <root required> Mobo: KBL model: Charmander_KL v: V1.20 serial: <root required> UEFI: Insyde v: 1.20 date: 06/04/2018 Battery: ID-1: BAT1 charge: 48.0 Wh condition: 48.0/48.9 Wh (98%) model: LG 004B384234314341 status: Full CPU: Topology: Quad Core model: Intel Core i7-8550U bits: 64 type: MT MCP arch: Kaby Lake rev: A L2 cache: 8192 KiB flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 31872 Speed: 894 MHz min/max: 400/4000 MHz Core speeds (MHz): 1: 2006 2: 2123 3: 2189 4: 1887 5: 1950 6: 1674 7: 1387 8: 1799 Graphics: Device-1: Intel UHD Graphics 620 vendor: Acer Incorporated ALI driver: i915 v: kernel bus ID: 00:02.0 Display: x11 server: X.org 1.20.8 driver: i915 resolution: <xdpyinfo missing> OpenGL: renderer: Mesa Intel UHD Graphics 620 (KBL GT2) v: 4.6 Mesa 20.0.7 direct render: Yes Audio: Device-1: Intel Sunrise Point-LP HD Audio vendor: Acer Incorporated ALI driver: snd_hda_intel v: kernel bus ID: 00:1f.3 Sound Server: ALSA v: k4.19.126-1-MANJARO Network: Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: Acer Incorporated ALI driver: r8168 v: 8.048.03-NAPI port: 3000 bus ID: 01:00.1 IF: enp1s0f1 state: up speed: 100 Mbps duplex Device-2: Qualcomm Atheros QCA9377 802.11ac Wireless Network Adapter vendor: Lite-On driver: ath10k_pci v: kernel port: 3000 bus ID: 02:00.0 IF: wlp2s0 state: up Drives: Local Storage: total: 1.14 TiB used: 598.18 GiB (51.4%) ID-1: /dev/nvme0n1 vendor: Western Digital model: WDS250G3X0C-00SJG0 size: 232.89 GiB ID-2: /dev/sda vendor: Western Digital model: WDS100T2B0A size: 931.51 GiB Partition: ID-1: / size: 770.61 GiB used: 598.04 GiB (77.6%) fs: ext4 dev: /dev/dm-0 ID-2: /tmp size: 29.33 GiB used: 89.7 MiB (0.3%) fs: ext4 dev: /dev/dm-2 ID-3: swap-1 size: 203.08 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/dm-1 Sensors: System Temperatures: cpu: 46.0 C mobo: N/A Fan Speeds (RPM): N/A Info: Processes: 287 Uptime: 58m Memory: 11.60 GiB used: 2.61 GiB (22.5%) Init: systemd Compilers: gcc: 10.1.0 clang: 10.0.0 Shell: bash v: 5.0.17 inxi: 3.0.37