Nvidia dmesg stack trace with kernel 4.16


#1

I switched to linux416 and today I saw in my log the following nvidia error with a full call trace:

Apr 03 19:44:07 rakete kernel: Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'nvidia_stack_cache' (offset 11440, size 3)!
Apr 03 19:44:07 rakete kernel: WARNING: CPU: 5 PID: 2426 at mm/usercopy.c:81 usercopy_warn+0x7e/0xa0
Apr 03 19:44:07 rakete kernel: Modules linked in: msr snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic stb6100 lnbp22 stb0899 dvb_usb_pctv452e dvb_usb ttpci_eeprom dvb_core rc_core mousedev inp>
Apr 03 19:44:07 rakete kernel:  nfsd ipmi_msghandler vboxpci(O) vboxnetflt(O) vboxnetadp(O) auth_rpcgss nfs_acl lockd grace sunrpc vboxdrv(O) sg crypto_user ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypt>
Apr 03 19:44:07 rakete kernel: CPU: 5 PID: 2426 Comm: Xorg Tainted: P           O     4.16.0-1-MANJARO #1
Apr 03 19:44:07 rakete kernel: Hardware name: MSI MS-7A63/Z270 GAMING PRO CARBON (MS-7A63), BIOS 1.80 01/26/2018
Apr 03 19:44:07 rakete kernel: RIP: 0010:usercopy_warn+0x7e/0xa0
Apr 03 19:44:07 rakete kernel: RSP: 0018:ffffa534cbdebb58 EFLAGS: 00010286
Apr 03 19:44:07 rakete kernel: RAX: 0000000000000000 RBX: ffff9e229d34acb0 RCX: 0000000000000001
Apr 03 19:44:07 rakete kernel: RDX: 0000000080000001 RSI: ffffffff9ce5390c RDI: 00000000ffffffff
Apr 03 19:44:07 rakete kernel: RBP: 0000000000000003 R08: 0000000000000098 R09: 000000000000037b
Apr 03 19:44:07 rakete kernel: R10: ffffffff9ce8c661 R11: 0000000000000001 R12: 0000000000000001
Apr 03 19:44:07 rakete kernel: R13: ffff9e229d34acb3 R14: 0000000000000000 R15: ffff9e229d34acf8
Apr 03 19:44:07 rakete kernel: FS:  00007f855d497940(0000) GS:ffff9e22eed40000(0000) knlGS:0000000000000000
Apr 03 19:44:07 rakete kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 03 19:44:07 rakete kernel: CR2: 00007f8555441010 CR3: 0000000820e28005 CR4: 00000000003606e0
Apr 03 19:44:07 rakete kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 03 19:44:07 rakete kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr 03 19:44:07 rakete kernel: Call Trace:

I google’d this and found this entry at nvidia with a reply from Phil:

https://devtalk.nvidia.com/default/topic/1031067/-linux416-nvidia-390-48-nvidia_stack_cache-rip-0010-usercopy_warn-0x7e-0xa0/?offset=2

So it looks like the issue is known, but there does not seem to be a cure. What to do? Go back to kernel 4.15 until fixed?

Kind Regards
Matthias


Nvidia 390xx dmesg stack trace with kernel 4.19
#2

answer from philm


#3

@stephane :

Please see also the next two answers on devtalk.nvidia.com which say that it is not a spectre/meltdown issue.


#4

Uhm, @stephane posted what looks like the same crash in that thread:

Bad or missing usercopy whitelist? Kernel memory exposure attempt detected from SLUB object 'nvidia_stack_cache' (offset 11440, size 3)!

To which @philm responded to disable spectre/meltdown. So if that worked for them, I would try it, it might work for you.


#5

@philm , you could have same error with spitfire laptop


#6

good morning evenings or evenings out there, greetings to all, I was reading the forum and found this yours, I did not understand well if there are problems with nvidia and kernel 4.16.0.1, but I have recently changed from 4.14 to that to prove it, my card nvidia 730 works perfect with both (version 390.42) and the dmesg does not throw errors, in fact it follows as always in 45º of temperature, I must return to 4.14?
Regarding the specter / meltdown issue mitigation, I went back to execute the spectre-meltdown-checker.sh script that was on the web and my amd keeps saying “not vulnerable”, although it may not be valid anymore.

Attentively happy day to all.

PS: Sorry for the English of Google but I do not speak it.


#7

@manjaromanyarolibre
this is specific to intel & nvidia & kernel 4.16 version


#8

I tried with several kernel options

nopti
nopti noretpoline
nopti spectre_v2=off

But nothing helped.


#9

I have this problem too, and

@mbod, is OBS xcomposite working for you? I do not know if it is related.


#10

I assume you guys use the linux416 kernel from our stable branch. This issue should be fixed already in our testing branch …


#11

I can confirm. Error not showing anymore with testing branch.


#12

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.