Issues with video card losing signal with display

OK I have been having this issue for awhile now with my Nvidia 970 GTX losing the signal to the display. I don’t if this is a driver issue with closed source drivers or the video card itself is the cause.

This always happens with games, although not all. At first this only happen with Windows games using Proton, now it is some Linux Native ones but not all.

I’m thinking either switching to the Open Source Nouveau drivers, to the iGPU, or simply buy this years AMD Radeon dGPU.

Has anyone else have been having issues with Nvidia Proprietary Drivers? If so, how did you deal with them?

Actually there are several more modes of failure.

The display could be starting to fail in some way. Generally such would be a failure of the power supply of the display in question. To test this you would turn down the brightness of the screen in question.

Is the graphics-card overheating in any capacity? If the temperature gets to high some cards can, if they are close to their stability point in terms of voltage, fail and cause problems.

To first determine the most likely causes of the problem we would need to know how exactly the failure mode behaves in your case. Does the picture just freeze or is the display turning off and on again for example?

No the card loses the signal and the display goes into power saving mode. I have to press the displays power button twice and turn the computer off and on again with the display.

Everything is perfectly fine with videos and normal desktop use. This only happens with games and not all of them.

So you have to turn off the display with the complete system to make it functional again? That is a very odd mode of failure. I have actually never seen something like this.
Could you install green with envy and underclock your card as much as possible in terms of core and memory to test if the card is failing?

so how do I use green with envy and what is coolbits?

Coolbits are a value that tell the nvidia driver how much control you can have over your card, for example control fan-speed or over/underclock your card.
https://wiki.archlinux.org/index.php/NVIDIA/Tips_and_tricks#Enabling_overclocking
For what we want to do here you will want to enable the coolbit 8 with nvidia-xconfig --cool-bits=8 .
When you have done that you can create a profile on the lower left in gwe and decrease both Gpu and Mem offsets as much as possible.

1 Like

Thanks I’ll try that out.

I don’t see where/how I can lower the clocks of both GPU and Memory.

Lower left corner of the program.
-----------------------------------------Overclock profile------------------
button to select and make new profiles|| ||edit current profile||

1 Like

Thanks. I set the GPU clock to 600 and the Memory to 1200. I had to reboot the System, and when I start GWE, the screen becomes garbled.

Did I set the clocks too low?

You set the offsets actually way to high, you crashed your system, congratulations? As i said earlier and as it says in the program, these are offsets, not actual clock values. The offsets also go into the minus where you should set them as far as you are able, to test out if the gc is the problem. After that do some stuff that would normally cause the behaviour you described before.

Oops, I How do fix this then? Delete the config file?

That would be easiest, yes.
The config database would be in ~/.config/gwe/
Don’t worry to much about this though as it is just software overclocking. You would have much bigger issues if you had modified the firmware in that manner. :wink:

OK I set GPU offset to -700 MHz and the Memory offset to -500 MHz.

I’ll play a game and see if this works.

Nope. Didn’t work. Did I set the clocks too low? Or maybe I should check to see the video card still works.

Or switch to the open source drivers and see if it is a driver issue?

You generally cannot set the clocks to low. Atleast now we should be able to say your gc is fine.
The main thing that concerns me, is that you said it came over time, since that would not point to a software problem generally.
Maybe the logs will give some clue to what is exactly happening. Could you post the output of journalctl -x -p3 -b1 ?

Here you go:

[CODE] – Logs begin at Sat 2020-04-11 12:31:05 CDT, end at Sun 2020-08-23 07:05:34 CDT. –
Apr 11 12:31:05 william-pc kernel: [Firmware Bug]: TSC_DEADLINE disabled due to Errata; please update microcode to version: 0x22 (or later)
Apr 11 12:31:05 william-pc systemd[1]: Failed to start Load Kernel Modules.
Apr 11 12:31:05 william-pc systemd[1]: Failed to start CLI Netfilter Manager.
Apr 11 12:31:05 william-pc systemd-modules-load[257]: Failed to lookup module alias ‘crypto_user’: Function not implemented
Apr 11 12:31:05 william-pc systemd-modules-load[257]: Failed to lookup module alias ‘nvidia’: Function not implemented
Apr 11 12:31:05 william-pc systemd-modules-load[257]: Failed to lookup module alias ‘nvidia-drm’: Function not implemented
Apr 11 12:31:05 william-pc systemd-modules-load[257]: Failed to lookup module alias ‘uinput’: Function not implemented
Apr 11 12:31:09 william-pc systemd[1]: Failed to start Light Display Manager.
– Subject: A start job for unit lightdm.service has failed
– Defined-By: systemd
– Support: https://forum.manjaro.org/c/technical-issues-and-assistance

– A start job for unit lightdm.service has finished with a failure.

– The job identifier is 843 and the job result is failed.
[/CODE]

I wonder if the is a Distro issue, but I hate to change to another after using Manjaro for five years.

OK I removed the Nvidia drives manually and now I need to remember how to use chroot and MHWD from the CLI.

OK I fix my issue with X not starting by replacing video-linux with nouveau in the driver section of the xorg.conf file.