Frequent crashing / freezing; tried many kernels and video drivers; any help appreciated

Hello,
I’ve been having trouble with my Manjaro installation randomly freezing and crashing, and was hoping someone might be able to help me determine what is going on.

First, here’s the output of

$inxi -Fxzc0
System:    Kernel: 5.7.19-2-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.0 
           Desktop: KDE Plasma 5.19.5 Distro: Manjaro Linux 
Machine:   Type: Desktop Mobo: ASRock model: B450M Pro4 serial: <filter> UEFI: American Megatrends 
           v: P1.50 date: 10/17/2018 
CPU:       Topology: Dual Core model: AMD Athlon 200GE with Radeon Vega Graphics bits: 64 
           type: MT MCP arch: Zen L2 cache: 1024 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 25561 
           Speed: 1549 MHz min/max: 1600/3200 MHz Core speeds (MHz): 1: 1967 2: 1967 3: 1376 
           4: 1377 
Graphics:  Device-1: NVIDIA GK104 [GeForce GTX 670] vendor: eVga.com. driver: nvidia v: 440.100 
           bus ID: 10:00.0 
           Display: x11 server: X.Org 1.20.9 driver: nvidia resolution: 1920x1080~60Hz 
           OpenGL: renderer: GeForce GTX 670/PCIe/SSE2 v: 4.6.0 NVIDIA 440.100 direct render: Yes 
Audio:     Device-1: NVIDIA GK104 HDMI Audio vendor: eVga.com. driver: snd_hda_intel v: kernel 
           bus ID: 10:00.1 
           Device-2: AMD Family 17h HD Audio vendor: ASRock driver: snd_hda_intel v: kernel 
           bus ID: 38:00.6 
           Device-3: Creative SB X-Fi Surround 5.1 type: USB driver: snd-usb-audio bus ID: 1-6:3 
           Sound Server: ALSA v: k5.7.19-2-MANJARO 
Network:   Device-1: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet vendor: ASRock 
           driver: r8168 v: 8.048.03-NAPI port: e000 bus ID: 1f:00.0 
           IF: enp31s0 state: up speed: 1000 Mbps duplex: full mac: <filter> 
Drives:    Local Storage: total: 465.76 GiB used: 239.87 GiB (51.5%) 
           ID-1: /dev/nvme0n1 vendor: Mushkin model: MKNSSDPL500GB-D8 size: 465.76 GiB 
Partition: ID-1: / size: 440.21 GiB used: 239.87 GiB (54.5%) fs: ext4 dev: /dev/nvme0n1p2 
Swap:      ID-1: swap-1 type: partition size: 17.22 GiB used: 0 KiB (0.0%) dev: /dev/nvme0n1p3 
Sensors:   System Temperatures: cpu: 30.2 C mobo: 31.0 C gpu: nvidia temp: 37 C 
           Fan Speeds (RPM): fan-1: 0 fan-2: 2760 fan-3: 0 fan-4: 0 fan-5: 716 gpu: nvidia 
           fan: 30% 
           Voltages: 12v: N/A 5v: N/A 3.3v: 3.36 vbat: 3.28 
Info:      Processes: 211 Uptime: 1m Memory: 15.63 GiB used: 1.10 GiB (7.1%) Init: systemd 
           Compilers: gcc: 10.2.0 clang: 10.0.1 Packages: 1457 Client: KDE Plasma v: 5.19.5 
           inxi: 3.1.05 

Brief description of problem:
I’m experiencing frequent crashing of web browsers (Firefox and Chromium). When a browser crashes, it destabilizes the whole system - i.e., other programs subsequently crash or refuse to open unless I reboot the system first. Once a crash occurs, it seems Plasma is non-functional and unable to recover until after the reboot. I’ve tried restarting Plasma with

$kquitapp5 plasmashell && kstart5 plasmashell

but this is ineffective and usually leads to a freeze when I try to reboot the system later.

Error Log Output:
This is my first time really diving into any error logs. I’m not sure what’s relevant.
It seems multiple types of errors precipitate the crashes. Here are a few examples:

  • kernel NULL pointer dereference

  • PCIe Bus Timeout error: PCI bridge: [AMD] Raven/Raven2 PCIe GPP Bridge [6:0]

  • BUG: Bad rss-counter state

  • BUG: Bad page cache in process Xorg

  • And finally, sometimes processes just generate coredumps with no obvious preceding errors.

I can post specific examples of error logs if that will be helpful.

What I have already attempted:

  • I’ve tried kernels 5.4, 5.7, and 5.8; they all present the issue.
  • I’ve tried Nvidia proprietary driver series 430xx and 440xx; the issue persists.
  • I’ve run Prime95 successfully for over four hours to rule out any obvious hardware instabilities.

I sincerely thank you in advance for your time and any help or insight you can give.

I have the same question. Sometimes it crashes with a green screen , all my mouse and keyboard have no response. I think maybe it has some connection with drivers or the version of manjaro. Hope someone could help us :smiley:

Hello,

That kernel is EOL.

Check the schedulers part and try that

With KDE Plasma and an Nvidia GPU i suggest sometimes this approach:

1 Like

I’ve only been using the old 5.7 series kernel since it has been more stable for me. Both 5.4 LTS and the current 5.8 series are more unstable for me. Is it dangerous to use the old EOL kernel until I can figure this issue out?

I followed the instructions and created the udev rules to change the scheduler from “none” to “mfq-deadline” as specified. I verified it was working with this command:

cat /sys/block/nvme0n1/queue/scheduler

And finally

I set these options

Option   "TripleBuffer"  "On"
Option   "ConnectToAcpid"   "Off"
Option   "metamodes" "nvidia-auto-select +0+0 {ForceFullCompositionPipeline = On}

as specified in the post and created the kwin script to use them. My original issues were more with spontaneous crashing than screen tearing or lagging. Setting these options has actually made screen tearing worse for me. But I’ll try running with these options for awhile to see if they prevent the crashes.

Thank you for all your help and suggestions; I really appreciate it.

Sometimes the variant where all 3 lines need to be added, not just the first one, and in nvidia.conf instead of ForceFull is better to use only Force … Also, i prefer the fixed metamodes instead of the nvidia-auto-detect.

I went back and added all three lines as specified and switched to Force instead of ForceFull. I also tried putting the fixed metamodes in the Screen section as opposed to the Device section. This did help some. Minor screen tearing was still present, but crashes were less frequent and less severe. However, in the end, the crashing and Plasma freezing issues remained.

I removed the Nvidia 440xx drivers and tried to install the open-source video-linux / Nouveau drivers with mhwd by booting into the runlevel 3 tty per the instructions here: Configure Graphics Cards - Manjaro

But, after removing the Nvidia drivers and installing the video-linux drivers, the machine would only boot to a black screen. So I removed the open source drivers and installed the old Nvidia 390xx drivers instead to see if crashing will be less with this older series.

Any idea what might cause a black screen when booting the video-linux drivers? Is there something else that must be done besides the standard add/remove mhwd commands listed in the wiki to switch from Nvidia to Nouveau?

Thank you again for all your help!

You have to check inside /etc/modprobe.d/mhwd-gpu.conf that you don have blacklist nouveau if using video-linux and in /etc/modules-load.d/mhwd-gpu.conf that is not trying to load nvidia and nvidia-drm

Anyway, i used to have a GeForce GTX 650 that now is on another PC, and best results always have been with the Nvidia drivers.

1 Like

Ah! Yep, that was exactly it. Thanks! I’ll comment those out if I try Nouveau again.

My only experiences with Nouveau are on ancient cards cards like a GeForce 9600 GT and a GTX 280. The 9600 has awful graphical problems, but I think that’s because the card itself may be dying.

What got me interested in Nouveau was having to boot Manjaro from a USB drive and run a live session for awhile when I was getting crashes every two minutes or so otherwise. Not one crash while using the live session with Nouveau though.

I just really don’t get what’s happened all of a sudden. Everything used to run just fine and stable as a rock but now these issues occur in multiple kernels / driver series.

I’ll keep playing with things. I’m hoping to be able to upgrade my GTX 670 soon anyway, so long as I can get the system mostly stable for now, I’ll be happy.

Thank you so much for all your time and willingness to help out. I’ve been learning a lot!

After more tinkering, I have found a solution that seems to work at the moment. I switched to the much older Nvidia 390xx series drivers, and have been using them in conjunction with the 5.4 LTS kernel. So far, the crashing has completely disappeared.

Anyway, just wanted to update the thread in case anyone else encounters a similar situation - perhaps this might work for others experiencing issues on a GTX 600 series card as well.

Thanks again @bogdancovaciu for all the assistance!