System Hangs I suspect about Nvidia error

From time to time my system becomes irresponsive specially when I use google meetings:
[sudo] senha para vfbsilva:
– Logs begin at Mon 2020-06-08 07:39:48 -03, end at Tue 2020-10-06 17:54:01 -03. –
out 06 17:51:16 rohan kernel: nvidia-gpu 0000:01:00.3: i2c timeout error e0000000
out 06 17:51:16 rohan kernel: ucsi_ccg 0-0008: i2c_transfer failed -110
out 06 17:51:16 rohan kernel: ucsi_ccg 0-0008: ucsi_ccg_init failed - -110

inxi -Fza      
System:
  Kernel: 5.7.19-2-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.0 
  parameters: BOOT_IMAGE=/boot/vmlinuz-5.7-x86_64 
  root=UUID=b841b317-5f27-4086-9390-595dff39a5c8 rw quiet apparmor=1 
  security=apparmor udev.log_priority=3 
  Desktop: KDE Plasma 5.19.5 tk: Qt 5.15.1 wm: kwin_x11 dm: SDDM 
  Distro: Manjaro Linux 
Machine:
  Type: Desktop Mobo: Micro-Star model: Z390-A PRO (MS-7B98) v: 1.0 serial: <filter> 
  UEFI: American Megatrends v: 1.80 date: 12/25/2019 
CPU:
  Topology: 6-Core model: Intel Core i5-9600K bits: 64 type: MCP arch: Kaby Lake 
  family: 6 model-id: 9E (158) stepping: C (12) microcode: D6 L2 cache: 9216 KiB 
  flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 44412 
  Speed: 800 MHz min/max: 800/4700 MHz Core speeds (MHz): 1: 800 2: 800 3: 800 
  4: 800 5: 800 6: 800 
  Vulnerabilities: Type: itlb_multihit status: KVM: VMX disabled 
  Type: l1tf status: Not affected 
  Type: mds mitigation: Clear CPU buffers; SMT disabled 
  Type: meltdown status: Not affected 
  Type: spec_store_bypass 
  mitigation: Speculative Store Bypass disabled via prctl and seccomp 
  Type: spectre_v1 
  mitigation: usercopy/swapgs barriers and __user pointer sanitization 
  Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, 
  STIBP: disabled, RSB filling 
  Type: srbds mitigation: Microcode 
  Type: tsx_async_abort mitigation: Clear CPU buffers; SMT disabled 
Graphics:
  Device-1: NVIDIA TU106 [GeForce RTX 2070] driver: nvidia v: 450.66 
  alternate: nouveau,nvidia_drm bus ID: 01:00.0 chip ID: 10de:1f02 
  Display: x11 server: X.Org 1.20.9 compositor: kwin_x11 driver: nvidia 
  display ID: :0 screens: 1 
  Screen-1: 0 s-res: 1920x1080 s-dpi: 81 s-size: 602x343mm (23.7x13.5") 
  s-diag: 693mm (27.3") 
  Monitor-1: DP-0 res: 1920x1080 hz: 60 dpi: 82 size: 598x336mm (23.5x13.2") 
  diag: 686mm (27") 
  OpenGL: renderer: GeForce RTX 2070/PCIe/SSE2 v: 4.6.0 NVIDIA 450.66 
  direct render: Yes 
Audio:
  Device-1: Intel Cannon Lake PCH cAVS vendor: Micro-Star MSI driver: snd_hda_intel 
  v: kernel alternate: snd_soc_skl,snd_sof_pci bus ID: 00:1f.3 chip ID: 8086:a348 
  Device-2: NVIDIA TU106 High Definition Audio driver: snd_hda_intel v: kernel 
  bus ID: 01:00.1 chip ID: 10de:10f9 
  Device-3: Logitech Logitech Webcam C925e type: USB driver: snd-usb-audio,uvcvideo 
  bus ID: 1-2:7 chip ID: 046d:085b serial: <filter> 
  Sound Server: ALSA v: k5.7.19-2-MANJARO 
Network:
  Device-1: Intel Ethernet I219-V vendor: Micro-Star MSI driver: e1000e v: 3.2.6-k 
  port: efa0 bus ID: 00:1f.6 chip ID: 8086:15bc 
  IF: eno1 state: up speed: 100 Mbps duplex: full mac: <filter> 
  IF-ID-1: br-232fd21ea23b state: down mac: <filter> 
  IF-ID-2: br-5d01cbb2628b state: down mac: <filter> 
  IF-ID-3: docker0 state: down mac: <filter> 
Drives:
  Local Storage: total: 912.89 GiB used: 105.32 GiB (11.5%) 
  SMART Message: Unable to run smartctl. Root privileges required. 
  ID-1: /dev/sda vendor: Samsung model: SSD 840 EVO 500GB size: 465.76 GiB 
  block size: physical: 512 B logical: 512 B speed: 6.0 Gb/s serial: <filter> 
  rev: DB6Q scheme: GPT 
  ID-2: /dev/sdb vendor: Kingston model: SA400S37480G size: 447.13 GiB block size: 
  physical: 512 B logical: 512 B speed: 6.0 Gb/s serial: <filter> rev: 0102 
  scheme: GPT 
Partition:
  ID-1: / raw size: 154.16 GiB size: 150.74 GiB (97.78%) used: 105.28 GiB (69.8%) 
  fs: ext4 dev: /dev/sdb1 
Swap:
  Kernel: swappiness: 60 (default) cache pressure: 100 (default) 
  ID-1: swap-1 type: file size: 4.00 GiB used: 0 KiB (0.0%) priority: -2 
  file: /swapfile 
Sensors:
  System Temperatures: cpu: 34.0 C mobo: N/A gpu: nvidia temp: 31 C 
  Fan Speeds (RPM): N/A gpu: nvidia fan: 45% 
Info:
  Processes: 239 Uptime: 14m Memory: 15.58 GiB used: 4.12 GiB (26.4%) Init: systemd 
  v: 246 Compilers: gcc: 10.2.0 alt: 8/9 Packages: 1533 pacman: 1531 lib: 410 
  flatpak: 0 snap: 2 Shell: Bash v: 5.0.18 running in: konsole inxi: 3.1.05

I wonder if this is somehow related: 206653 – i2c_nvidia_gpu takes too long time and makes system suspend & resume failed with NVIDIA GTX 1660 card

According to this
https://bugzilla.kernel.org/show_bug.cgi?id=206653
is purely a cosmetic issue. Should not influence the performance, but still they look ugly being there.
My proposal would be to try a different kernel

Is EOL.
Also, have you tried the nvidia 455xx driver ?

It is not listed for me, do I need to enable the testing branch?

1 Like

Ah, sorry … i’m on two unstable installs and i missed the part that still all this needs more testing, but yet is not in testing. You can try to switch the kernel tho.

On my system I had similar issues.

What I ended up doing was blacklisting the module.
Blacklisting the nouveau driver wouldn’t be a bad idea either since you are using nvidia’s driver.

the ucsi_ccg module is for the USB Type C port on the graphics card, so if you don’t use it, try blacklisting it to check if it solves the issue.

[user@box ~]$ cat /etc/modprobe.d/blacklist.conf  
blacklist ucsi_ccg
blacklist nouveau

Reboot and let us know.
Just my 2 cents though :wink: !

1 Like

quick shot in the dark: if you are facing an i2c issue in combination with nvidia you might try to
blacklist i2c_nvidia_gpu

It will disable the USB onboard of the GPU card
https://bugzilla.redhat.com/show_bug.cgi?id=1720876

1 Like