I am using the latest 6.5 kernel and latest Nvidia drivers. Every day, around midnight (usually a few hours after I stop using the PC) My system fully freezes and is unresponsive. Not even ctrl+alt+F3 works. I have seen some suggestions to put the gup in persistent mode, but that doesn’t work, When I look in journalctl, here is the error:
Nov 29 10:51:41 derp-linux kscreenlocker_greet[14710]: Qt: Session management error: networkIdsList argument is NULL
Nov 29 10:51:41 derp-linux kscreenlocker_greet[14710]: kscreenlocker_greet: Lockscreen QML outdated, falling back to default
Nov 29 10:51:42 derp-linux kscreenlocker_greet[14710]: kf.kirigami: Failed to find a Kirigami platform plugin
Nov 29 10:58:46 derp-linux kernel: NVRM: GPU at PCI:0000:02:00: GPU-c7f83bd9-0ef0-ca3d-c7da-fec78d33c876
Nov 29 10:58:46 derp-linux kernel: NVRM: Xid (PCI:0000:02:00): 79, pid='<unknown>', name=<unknown>, GPU has fallen off the bus.
Nov 29 10:58:46 derp-linux kernel: NVRM: GPU 0000:02:00.0: GPU has fallen off the bus.
Nov 29 10:58:46 derp-linux kernel: NVRM: A GPU crash dump has been created. If possible, please run
NVRM: nvidia-bug-report.sh as root to collect this data before
NVRM: the NVIDIA kernel module is unloaded.
Nov 29 10:58:47 derp-linux plasmashell[996]: ERROR VulkanRender.cpp:395 VkResult is "VK_##str"
Nov 29 10:58:50 derp-linux plasmashell[996]: ERROR VulkanRender.cpp:395 VkResult is "VK_##str"
(Note that the unknown is due to this being an old crash. This does not normally say that)
Here are my kernel parameters:
quiet splash udev.log_priority=3 pci=noaer pcie_aspm=off
This has been going on for months, and I really would like a solution.
My GPU also falls off the bus randomly for an unknown reason. I’ve spent hours researching. The NVIDIA documentation isn’t really helpful, unfortunately: XID Errors :: GPU Deployment and Management Documentation
Odd thing is, I can play AAA games for hours on end with no issue, however it will randomly happen not doing much of anything.
1 Like
Exactly, It seems to only be when the system isn’t under a lot of load. I’ve seen some people online say that this happens when a GPU is dying, but seeing as how this is happening to you too, I doubt that’s the case.
For reference, I have a System76 Gazelle 17 (gaze17-3060-b). System76 sent me a replacement last year not long after I bought it and that made no difference, so at least that rules out a manufacturing defect with the original laptop. Otherwise I’m quite happy with it and like the company.
inxi -Fazy
System:
Kernel: 6.5.13-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
clocksource: tsc available: acpi_pm
parameters: root=UUID=cf70171e-27dd-43ec-a0b7-52a1fa96be2a rw
add_efi_memmap initrd=boot\intel-ucode.img
initrd=boot\initramfs-6.5-x86_64.img ec_sys.write_support=1 splash quiet
udev.log_priority=3
Desktop: GNOME v: 45.1 tk: GTK v: 3.24.38 wm: gnome-shell dm: GDM v: 45.0.1
Distro: Manjaro Linux base: Arch Linux
Machine:
Type: Laptop System: System76 product: Gazelle v: gaze17-3060-b
serial: <superuser required> Chassis: type: 9 serial: <superuser required>
Mobo: System76 model: Gazelle v: gaze17-3060-b serial: <superuser required>
UEFI: coreboot v: 2023-09-08_42bf7a6 date: 09/08/2023
Battery:
ID-1: BAT0 charge: 47.6 Wh (91.2%) condition: 52.2/54.8 Wh (95.3%)
volts: 17.0 min: 15.4 model: Notebook BAT type: Li-ion serial: <filter>
status: not charging cycles: 11
CPU:
Info: model: 12th Gen Intel Core i7-12700H bits: 64 type: MST AMCP
arch: Alder Lake gen: core 12 level: v3 note: check built: 2021+
process: Intel 7 (10nm ESF) family: 6 model-id: 0x9A (154) stepping: 3
microcode: 0x430
Topology: cpus: 1x cores: 14 mt: 6 tpc: 2 st: 8 threads: 20 smt: enabled
cache: L1: 1.2 MiB desc: d-8x32 KiB, 6x48 KiB; i-6x32 KiB, 8x64 KiB
L2: 11.5 MiB desc: 6x1.2 MiB, 2x2 MiB L3: 24 MiB desc: 1x24 MiB
Speed (MHz): avg: 1378 high: 3497 min/max: 400/4600:4700:3500 scaling:
driver: intel_pstate governor: powersave cores: 1: 1018 2: 848 3: 400 4: 2331
5: 400 6: 400 7: 2011 8: 400 9: 3443 10: 400 11: 3296 12: 400 13: 400
14: 400 15: 400 16: 1260 17: 1690 18: 2609 19: 3497 20: 1965
bogomips: 107560
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Vulnerabilities:
Type: gather_data_sampling status: Not affected
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: mmio_stale_data status: Not affected
Type: retbleed status: Not affected
Type: spec_rstack_overflow status: Not affected
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
sanitization
Type: spectre_v2 mitigation: Enhanced / Automatic IBRS, IBPB: conditional,
RSB filling, PBRSB-eIBRS: SW sequence
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: Intel Alder Lake-P GT2 [Iris Xe Graphics] vendor: CLEVO/KAPOK
driver: i915 v: kernel arch: Gen-12.2 process: Intel 10nm built: 2021-22+
ports: active: DP-1 off: eDP-1 empty: DP-2,DP-3,DP-4 bus-ID: 00:02.0
chip-ID: 8086:46a6 class-ID: 0300
Device-2: NVIDIA GA106M [GeForce RTX 3060 Mobile / Max-Q]
vendor: CLEVO/KAPOK driver: nvidia v: 545.29.06 alternate: nouveau,nvidia_drm
non-free: 545.xx+ status: current (as of 2023-11; EOL~2026-12-xx)
arch: Ampere code: GAxxx process: TSMC n7 (7nm) built: 2020-2023 pcie:
gen: 4 speed: 16 GT/s lanes: 8 link-max: lanes: 16 ports: active: none
off: DP-5,HDMI-A-1 empty: eDP-2 bus-ID: 01:00.0 chip-ID: 10de:2520
class-ID: 0300
Device-3: Logitech Webcam C270 driver: snd-usb-audio,uvcvideo type: USB
rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 3-1.1:4
chip-ID: 046d:0825 class-ID: 0102 serial: <filter>
Device-4: Chicony USB2.0 Camera driver: uvcvideo type: USB rev: 2.0
speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 3-8:5 chip-ID: 04f2:b729
class-ID: fe01 serial: <filter>
Display: x11 server: X.Org v: 21.1.9 with: Xwayland v: 23.2.2
compositor: gnome-shell driver: X: loaded: modesetting,nvidia
alternate: fbdev,nouveau,nv,vesa dri: iris gpu: i915,nvidia,nvidia-nvswitch
display-ID: :1 screens: 1
Screen-1: 0 s-res: 5760x1080 s-dpi: 96 s-size: 1524x286mm (60.00x11.26")
s-diag: 1551mm (61.05")
Monitor-1: DP-1 pos: primary,center model: HP X24ih serial: <filter>
built: 2021 res: 1920x1080 dpi: 82 gamma: 1.2 size: 598x336mm (23.54x13.23")
diag: 605mm (23.8") ratio: 16:9 modes: max: 1920x1080 min: 720x400
Monitor-2: DP-5 mapped: DP-1-1 note: disabled pos: right model: MSI G27C4
serial: <filter> built: 2020 res: 1920x1080 dpi: 93 gamma: 1.2
size: 527x297mm (20.75x11.69") diag: 686mm (27") ratio: 16:9 modes:
max: 1920x1080 min: 640x480
Monitor-3: HDMI-A-1 mapped: HDMI-0 note: disabled pos: left model: HP X24ih
serial: <filter> built: 2021 res: 1920x1080 dpi: 93 gamma: 1.2
size: 527x297mm (20.75x11.69") diag: 605mm (23.8") ratio: 16:9 modes:
max: 1920x1080 min: 640x480
Monitor-4: eDP-1 mapped: eDP-1-1 note: disabled model: AU Optronics 0xaf90
built: 2020 res: 1920x1080 dpi: 142 gamma: 1.2 size: 344x193mm (13.54x7.6")
diag: 394mm (15.5") ratio: 16:9 modes: 1920x1080
API: EGL v: 1.5 hw: drv: intel iris drv: nvidia platforms: device: 0
drv: nvidia device: 1 drv: iris device: 3 drv: swrast gbm: drv: iris
surfaceless: drv: nvidia x11: drv: nvidia inactive: wayland,device-2
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 545.29.06
glx-v: 1.4 direct-render: yes renderer: NVIDIA GeForce RTX 3060 Laptop
GPU/PCIe/SSE2 memory: 5.86 GiB
API: Vulkan v: 1.3.269 layers: 10 device: 0 type: discrete-gpu name: NVIDIA
GeForce RTX 3060 Laptop GPU driver: nvidia v: 545.29.06
device-ID: 10de:2520 surfaces: xcb,xlib device: 1 type: integrated-gpu
name: Intel Graphics (ADL GT2) driver: mesa intel v: 23.1.9-manjaro1.1
device-ID: 8086:46a6 surfaces: xcb,xlib
Audio:
Device-1: Intel Alder Lake PCH-P High Definition Audio vendor: CLEVO/KAPOK
driver: snd_hda_intel v: kernel alternate: snd_sof_pci_intel_tgl
bus-ID: 00:1f.3 chip-ID: 8086:51c8 class-ID: 0403
Device-2: NVIDIA GA106 High Definition Audio vendor: CLEVO/KAPOK
driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 8
link-max: lanes: 16 bus-ID: 01:00.1 chip-ID: 10de:228e class-ID: 0403
Device-3: Logitech Webcam C270 driver: snd-usb-audio,uvcvideo type: USB
rev: 2.0 speed: 480 Mb/s lanes: 1 mode: 2.0 bus-ID: 3-1.1:4
chip-ID: 046d:0825 class-ID: 0102 serial: <filter>
Device-4: C-Media CM106 Like Sound Device
driver: hid-generic,snd-usb-audio,usbhid type: USB rev: 1.1 speed: 12 Mb/s
lanes: 1 mode: 1.1 bus-ID: 3-1.2:6 chip-ID: 0d8c:0102 class-ID: 0300
API: ALSA v: k6.5.13-1-MANJARO status: kernel-api
tools: alsactl,alsamixer,amixer
Server-1: PipeWire v: 1.0.0 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
4: pw-jack type: plugin tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Intel Alder Lake-P PCH CNVi WiFi driver: iwlwifi v: kernel
bus-ID: 00:14.3 chip-ID: 8086:51f0 class-ID: 0280
IF: wlp0s20f3 state: down mac: <filter>
Device-2: Intel Ethernet I219-V driver: e1000e v: kernel port: N/A
bus-ID: 00:1f.6 chip-ID: 8086:1a1f class-ID: 0200
IF: eno0 state: up speed: 1000 Mbps duplex: full mac: <filter>
IF-ID-1: Eddie state: unknown speed: N/A duplex: N/A mac: N/A
Bluetooth:
Device-1: Intel AX201 Bluetooth driver: btusb v: 0.8 type: USB rev: 2.0
speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 3-10:7 chip-ID: 8087:0026
class-ID: e001
Report: btmgmt ID: hci0 rfk-id: 1 state: up address: <filter> bt-v: 5.2
lmp-v: 11 status: discoverable: no pairing: no class-ID: 7c010c
Drives:
Local Storage: total: 6.37 TiB used: 3.31 TiB (52.0%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Samsung model: SSD 980 PRO 2TB
size: 1.82 TiB block-size: physical: 512 B logical: 512 B speed: 63.2 Gb/s
lanes: 4 tech: SSD serial: <filter> fw-rev: 5B2QGXA7 temp: 34.9 C
scheme: GPT
ID-2: /dev/nvme1n1 maj-min: 259:4 vendor: Samsung
model: SSD 970 EVO Plus 1TB size: 931.51 GiB block-size: physical: 512 B
logical: 512 B speed: 31.6 Gb/s lanes: 4 tech: SSD serial: <filter>
fw-rev: 2B2QEXM7 temp: 34.9 C scheme: GPT
ID-3: /dev/sda maj-min: 8:0 vendor: Seagate model: Game Drive PS4
size: 3.64 TiB block-size: physical: 4096 B logical: 512 B type: USB rev: 3.0
spd: 5 Gb/s lanes: 1 mode: 3.2 gen-1x1 tech: N/A serial: <filter>
fw-rev: 0304 scheme: GPT
Partition:
ID-1: / raw-size: 500 GiB size: 491.08 GiB (98.22%) used: 128.05 GiB (26.1%)
fs: ext4 dev: /dev/nvme0n1p1 maj-min: 259:1
ID-2: /boot/efi raw-size: 513 MiB size: 512 MiB (99.80%)
used: 46.9 MiB (9.2%) fs: vfat dev: /dev/nvme0n1p3 maj-min: 259:3
ID-3: /home raw-size: 1.33 TiB size: 1.31 TiB (98.35%)
used: 1.06 TiB (80.9%) fs: ext4 dev: /dev/nvme0n1p2 maj-min: 259:2
Swap:
Kernel: swappiness: 10 (default 60) cache-pressure: 100 (default) zswap: yes
compressor: zstd max-pool: 20%
ID-1: swap-1 type: file size: 16 GiB used: 85.8 MiB (0.5%) priority: -2
file: /swapfile
Sensors:
System Temperatures: cpu: 50.0 C mobo: N/A gpu: nvidia temp: 40 C
Fan Speeds (rpm): cpu: 0
Info:
Processes: 547 Uptime: 1d 3h 17m wakeups: 0 Memory: total: 32 GiB note: est.
available: 31.19 GiB used: 11.65 GiB (37.3%) Init: systemd v: 254
default: graphical tool: systemctl Compilers: gcc: 13.2.1 clang: 16.0.6
Packages: 2563 pm: pacman pkgs: 2518 libs: 545
tools: gnome-software,octopi,pamac,paru,yay pm: flatpak pkgs: 45 Shell: Zsh
v: 5.9 running-in: tilix inxi: 3.3.31
6x12
30 November 2023 20:29
5
A long shot but since @Yochanan seem to use a bunch of monitors this looks interesting:
The refresh rate of my MiniDisplay Port 1.2 to 2 HDMI monitors (was) on 59.93 and I would have this problem. If I set both to 60Hz one of the displays wouldn’t show. It was only once I set 1 monitor to 50Hz and one 60Hz that the problem was fixed.
Xid 79, GPU has fallen off the bus. - #15 by wlarsong - CUDA Programming and Performance - NVIDIA Developer Forums .
Not related for me. All three external monitors run at 144Hz and so does my laptop screen (lid is closed). I have Mini DisplayPort, HDMI & Thunderbolt 4 ports.
Jaypee
30 November 2023 21:15
7
I’ve had this happen on my nearly-6-year-old 1080 Ti for almost two years now. I suspect a hardware issue of some kind, perhaps in memory. When it starts to fall of the bus (it does this at least once per day), I take it out of the machine for 20 minutes, put it back in, and the card behaves for the next 4-10 months
bedna
30 November 2023 21:41
8
Maybe it just wanted a hug. Falling off a bus must hurt tremendously!
2 Likes