Nvidia drivers boot into black screen (after upgrade on Desktop system with Intel iGPU)

Sorry, I did not phrase this very well! :face_with_diagonal_mouth: Doesn’t matter though, I am just going to try it out. Anyway, I will restore the snapshot and redo the installation, I will keep you updated! :grinning:

Redid all the steps with the restored snapshot. Unfortunately, still no luck! I am still getting the same error as mentioned above. GPU has fallen off the bus.

May 29 15:56:49 JPC1 kernel: ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (\RMTW.SHWM) (20211217/utaddress-204)
May 29 15:56:49 JPC1 kernel: ACPI: OSL: Resource conflict; ACPI support missing from driver?
...
May 29 15:56:51 JPC1 kernel: NVRM: GPU at PCI:0000:01:00: GPU-b6d5e999-de6c-852b-bea5-4379861a0dbc
May 29 15:56:51 JPC1 kernel: NVRM: Xid (PCI:0000:01:00): 79, pid=867, GPU has fallen off the bus.
May 29 15:56:51 JPC1 kernel: NVRM: GPU 0000:01:00.0: GPU has fallen off the bus.
May 29 15:56:51 JPC1 kernel: BUG: kernel NULL pointer dereference, address: 0000000000000088
May 29 15:56:51 JPC1 kernel: #PF: supervisor write access in kernel mode
May 29 15:56:51 JPC1 kernel: #PF: error_code(0x0002) - not-present page
May 29 15:56:51 JPC1 kernel: PGD 0 P4D 0 
May 29 15:56:51 JPC1 kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
May 29 15:56:51 JPC1 kernel: CPU: 6 PID: 816 Comm: Xorg Tainted: P           OE     5.17.9-1-MANJARO #1 7fd1fa212587ceb9c10eebada1d251e7facbe5ca
May 29 15:56:51 JPC1 kernel: Hardware name: ASUS System Product Name/ROG STRIX B560-I GAMING WIFI, BIOS 0904 05/24/2021
May 29 15:56:51 JPC1 kernel: RIP: 0010:_nv033473rm+0xac/0x130 [nvidia]
May 29 15:56:51 JPC1 kernel: Code: 44 89 e0 5b 41 5c c3 0f 1f 80 00 00 00 00 48 c1 e1 06 48 03 8c fe e0 23 00 00 45 84 c0 8b 50 08 44 8b 48 0c 74 78 85 db 74 3c <48> 83 41 08 01 0f b6 10 83 e2 03 80 fa 03 75 bf 45 84 c0 75 ba 0f
May 29 15:56:51 JPC1 kernel: RSP: 0018:ffffa3f803b0f6f8 EFLAGS: 00010206
May 29 15:56:51 JPC1 kernel: RAX: ffff8bfe2e3bdce0 RBX: 0000000000000055 RCX: 0000000000000080
May 29 15:56:51 JPC1 kernel: RDX: 0000000000000116 RSI: ffff8bfe3f0c0008 RDI: 0000000000000014
May 29 15:56:51 JPC1 kernel: RBP: ffff8bfe2e3bdc90 R08: 0000000000000001 R09: 0000000000000000
May 29 15:56:51 JPC1 kernel: R10: 0000000000001110 R11: 0000000000000000 R12: 0000000000000055
May 29 15:56:51 JPC1 kernel: R13: ffff8bfe3f0c0008 R14: 0000000000000014 R15: ffff8bfe3f040008
May 29 15:56:51 JPC1 kernel: FS:  00007f65d813a100(0000) GS:ffff8c053f580000(0000) knlGS:0000000000000000
May 29 15:56:51 JPC1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 29 15:56:51 JPC1 kernel: CR2: 0000000000000088 CR3: 00000001021a4006 CR4: 0000000000770ee0
May 29 15:56:51 JPC1 kernel: PKRU: 55555554
May 29 15:56:51 JPC1 kernel: Call Trace:
May 29 15:56:51 JPC1 kernel:  <TASK>
May 29 15:56:51 JPC1 kernel:  ? _nv033470rm+0x162/0x2f0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv037675rm+0x70/0xb0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv037675rm+0x3f/0xb0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv011742rm+0x37/0x60 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv033853rm+0x109/0x220 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv011059rm+0x7c/0x170 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv033853rm+0x109/0x220 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv032311rm+0xc6/0x1f0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv010914rm+0x4e/0xc0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv033853rm+0x109/0x220 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv009158rm+0x115/0x170 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv033853rm+0x109/0x220 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv011987rm+0x2b5/0x4c0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv033853rm+0x109/0x220 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv012032rm+0x25d/0x310 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv033853rm+0x109/0x220 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv014242rm+0x3a/0x100 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv015179rm+0x16e/0x3c0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv021904rm+0x91/0x1e0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv021905rm+0x21/0x40 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv000696rm+0x1aa/0x2f0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? _nv000643rm+0x49c/0x20b0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? rm_init_adapter+0xc5/0xe0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? nv_open_device+0x2dc/0x8c0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? nvidia_open+0x2f3/0x600 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? kobj_lookup+0xf1/0x170
May 29 15:56:51 JPC1 kernel:  ? nvidia_frontend_open+0x50/0xa0 [nvidia a0d810184fda5cd2c24153a3e391921059a1c2a4]
May 29 15:56:51 JPC1 kernel:  ? chrdev_open+0xc1/0x250
May 29 15:56:51 JPC1 kernel:  ? cdev_device_add+0x90/0x90
May 29 15:56:51 JPC1 kernel:  ? do_dentry_open+0x1cf/0x3a0
May 29 15:56:51 JPC1 kernel:  ? path_openat+0xd94/0x1280
May 29 15:56:51 JPC1 kernel:  ? filename_lookup+0xe4/0x200
May 29 15:56:51 JPC1 kernel:  ? do_filp_open+0xaf/0x160
May 29 15:56:51 JPC1 kernel:  ? do_sys_openat2+0xb9/0x170
May 29 15:56:51 JPC1 kernel:  ? __x64_sys_openat+0x6a/0xa0
May 29 15:56:51 JPC1 kernel:  ? do_syscall_64+0x58/0x90
May 29 15:56:51 JPC1 kernel:  ? syscall_exit_to_user_mode+0x23/0x50
May 29 15:56:51 JPC1 kernel:  ? do_syscall_64+0x67/0x90
May 29 15:56:51 JPC1 kernel:  ? exc_page_fault+0x71/0x170
May 29 15:56:51 JPC1 kernel:  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
May 29 15:56:51 JPC1 kernel:  </TASK>
May 29 15:56:51 JPC1 kernel: Modules linked in: cmac algif_hash algif_skcipher af_alg qrtr nct6775 bnep hwmon_vid intel_rapl_msr iTCO_wdt intel_pmc_bxt vfat ee1004 iTCO_vendor_support mei_hdcp fat snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel soundwire_generic_alloca>
May 29 15:56:51 JPC1 kernel:  snd_hda_intel btmtk snd_intel_dspcfg i915 mousedev bluetooth snd_intel_sdw_acpi snd_hda_codec snd_hda_core snd_hwdep snd_pcm ecdh_generic rfkill snd_timer joydev snd ttm crc16 soundcore intel_gtt wmi video acpi_pad acpi_tad mac_hid squashfs loop ipmi_dev>
May 29 15:56:51 JPC1 kernel: CR2: 0000000000000088
May 29 15:56:51 JPC1 kernel: ---[ end trace 0000000000000000 ]---
May 29 15:56:51 JPC1 kernel: RIP: 0010:_nv033473rm+0xac/0x130 [nvidia]
May 29 15:56:51 JPC1 kernel: Code: 44 89 e0 5b 41 5c c3 0f 1f 80 00 00 00 00 48 c1 e1 06 48 03 8c fe e0 23 00 00 45 84 c0 8b 50 08 44 8b 48 0c 74 78 85 db 74 3c <48> 83 41 08 01 0f b6 10 83 e2 03 80 fa 03 75 bf 45 84 c0 75 ba 0f
May 29 15:56:51 JPC1 kernel: RSP: 0018:ffffa3f803b0f6f8 EFLAGS: 00010206
May 29 15:56:51 JPC1 kernel: RAX: ffff8bfe2e3bdce0 RBX: 0000000000000055 RCX: 0000000000000080
May 29 15:56:51 JPC1 kernel: RDX: 0000000000000116 RSI: ffff8bfe3f0c0008 RDI: 0000000000000014
May 29 15:56:51 JPC1 kernel: RBP: ffff8bfe2e3bdc90 R08: 0000000000000001 R09: 0000000000000000
May 29 15:56:51 JPC1 kernel: R10: 0000000000001110 R11: 0000000000000000 R12: 0000000000000055
May 29 15:56:51 JPC1 kernel: R13: ffff8bfe3f0c0008 R14: 0000000000000014 R15: ffff8bfe3f040008
May 29 15:56:51 JPC1 kernel: FS:  00007f65d813a100(0000) GS:ffff8c053f580000(0000) knlGS:0000000000000000
May 29 15:56:51 JPC1 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 29 15:56:51 JPC1 kernel: CR2: 0000000000000088 CR3: 00000001021a4006 CR4: 0000000000770ee0
May 29 15:56:51 JPC1 kernel: PKRU: 55555554
May 29 15:56:51 JPC1 kernel: pcieport 0000:00:01.0: AER: Corrected error received: 0000:00:01.0
May 29 15:56:51 JPC1 kernel: pcieport 0000:00:01.0: PCIe Bus Error: severity=Corrected, type=Physical Layer, (Receiver ID)
May 29 15:56:51 JPC1 kernel: pcieport 0000:00:01.0:   device [8086:4c01] error status/mask=00002001/00002000
May 29 15:56:51 JPC1 kernel: pcieport 0000:00:01.0:    [ 0] RxErr                  (First)

This time it did not even work once, as opposed to what happened this morning. I also tried the flags they seemed to work (I got no error logs in journalctl, but it did not resolve the blackscreen) with Kernel 5.17 and 5.15.

Weird is also, that the verification fails even though I installed nvidia-dkms:

May 29 15:56:48 JPC1 kernel: nvidia: loading out-of-tree module taints kernel.
May 29 15:56:48 JPC1 kernel: nvidia: module license 'NVIDIA' taints kernel.
May 29 15:56:48 JPC1 kernel: Disabling lock debugging due to kernel taint
May 29 15:56:48 JPC1 kernel: iwlwifi 0000:00:14.3: Direct firmware load for iwlwifi-QuZ-a0-hr-b0-69.ucode failed with error -2
May 29 15:56:48 JPC1 kernel: intel-spi 0000:00:1f.5: mx25l12805d (16384 Kbytes)
May 29 15:56:48 JPC1 kernel: iwlwifi 0000:00:14.3: api flags index 2 larger than supported by driver
May 29 15:56:48 JPC1 kernel: iwlwifi 0000:00:14.3: TLV_FW_FSEQ_VERSION: FSEQ Version: 89.3.35.37
May 29 15:56:48 JPC1 kernel: iwlwifi 0000:00:14.3: loaded firmware version 68.01d30b0c.0 QuZ-a0-hr-b0-68.ucode op_mode iwlmvm
May 29 15:56:48 JPC1 kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel

That shouldn’t be a problem here:

$ sudo dmesg | grep nvidia 
[    1.404883] nvidia: loading out-of-tree module taints kernel.
[    1.404897] nvidia: module license 'NVIDIA' taints kernel.
[    1.428599] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    1.458384] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
[    1.459397] nvidia 0000:01:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=none
[    1.580015] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  470.129.06  Thu May 12 22:42:45 UTC 2022
[    1.585119] nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
[    1.591997] nvidia-uvm: Loaded the UVM driver, major device number 510.
[    1.592926] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    1.592928] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 0

That looks like an upstream bug of nvidia.

And about this. NVRAM means it is related to the UEFI/Motherboard. Are you really sure it connected to the correct PCIE slot? Do you have any special configurations there? Most likely it could some sort of PCIE APM problem.

You should try lower kernels than linux 5.17 (a kernel which is stable means not that it is production ready :wink: )

Thanks for getting back to me. I did try Kernel 5.15 and 5.10, though, since I installed the headers for all 3 Kernels.

I don’t think I have any special configuration there, only disabled the notorious ASUS Intel performance mode (forgot its name) that lets Intel CPUs run like crazy even when not running performance hungry tasks, and disabled Secure Boot (obviously). I also played around with switching between Primary Display (Auto, Integrated Graphics, PEG = dGPU) and Multi Monitor Display but made no difference, so I kept it on iGPU, since Intel should be running by default anyway. Is there anything else I could be missing in the UEFI settings related to this?

Is there a bug in the driver of NVIDIA? Maybe try a lower version of the driver (if recommended)?

If there is nothing else that can be found on the software side, I would have to check the PCIE riser cable / mainboard. There is only one PCIE slot on my board, since its an mITX. The GPU is fine, I checked that already on my other Windows PC. I would like to be sure its nothing software related or that I am not doing anything wrong when installing / configuring before checking the hardware the hard way, since unbuilding my baby is going to take a whole day :melting_face: (it is a custom case from GEEEK)

I could try a Windows live USB on the PC and test the setup with that, to check if the problem is hardware related? If there is such an option.

you have a pci issue/linux issue
 the issue will not likely manifest on a windows pc 
 did you use those kernel parameters?
and what caught my eye is, that youre using a riser cable, which hopefully is the issue, since i read complaints on the internet with riser cables and graphical issues
 for example using riser cable pcie 3 gen on pci 4 gen hardware
 so, this is something worth checking out first 


Thanks for getting back to me @brahma ! If the hardware is faulty, it could also be only faulty for Linux? Or is creating a Windows live USB still not for nothing?

What kernel parameters do you mean?

I thought about the cable being an issue, since it is indeed a PCIE 3 cable, and both my card and the mainboard are PCIE 4 compatible, but I thought this might be fine, I would only get lower bandwidth / speed
 If this is really the case then I will have to check that next weekend when I can take the whole day unbuilding the PC. :smiling_face_with_tear:

i wouldnt say your hardware is faulty (hopefully not), but lets say something doesnt fit well with linux
 i dont think you can create a live usb of windows like you can do it with linux, so you would be just wasting your time

with kernel parameters i mean: pci=nomsi pci=nommconf pcie_aspm=off
hopefully its really because of the cable, because thats what i read, that pci3 with pci4 causes issues, so if you have the option to use pci4 gen cable than do this, and it will hopefully fix the issue


Greetings! I finally got the hybrid drivers working. It was indeed the PCIE Riser cable that was making problems. It works fine, but only in Gen3. So I had to search around to find the setting “PCIEx16_1 link speed” in the Asus B560-I’s BIOS and make the switch from “Auto” to “Gen3”. It now boots and shutdowns just fine. I can play Steam games (no Vulkan shader games though, still looking into that) with the prime-run %command% command and prime-run blender just fine! The temperature is displayed and the fans spin up correctly when under load.

It was only thanks to you guys @brahma and that I could resolve this issue so fast! Thank you so much! :blush:

Without the correct installation there was a big chance that I would have installed the wrong drivers (thanks @megavolt) and @brahma was initially on the right track with checking for issues on the PCIE slot itself.

Because the problem itself was the wrong setting in the BIOS, I am marking @brahma 's answer as the solution, still thank you so much for providing this installation guide @megavolt !

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.