Amdgpu glitch on hybrid laptop

as I currently have completely disabled NVIDIA I can’t advise on power limit but vaguely remember that it was not capped the last time. I read in a test that due to cooling restrictions it was not possible to reach 140W but around 100W was seen.

My current AMD related drivers are

root@LEGION5PRO:~# dpkg -l|grep oibaf
ii libdrm-amdgpu1:amd64 2.4.115+git2307210500.cc8c22~oibaf~j amd64 Userspace interface to amdgpu-specific kernel DRM services – runtime
ii libdrm-amdgpu1:i386 2.4.115+git2307210500.cc8c22~oibaf~j i386 Userspace interface to amdgpu-specific kernel DRM services – runtime
ii libdrm-common 2.4.115+git2307210500.cc8c22~oibaf~j all Userspace interface to kernel DRM services – common files
ii libdrm-intel1:amd64 2.4.115+git2307210500.cc8c22~oibaf~j amd64 Userspace interface to intel-specific kernel DRM services – runtime
ii libdrm-intel1:i386 2.4.115+git2307210500.cc8c22~oibaf~j i386 Userspace interface to intel-specific kernel DRM services – runtime
ii libdrm-nouveau2:amd64 2.4.115+git2307210500.cc8c22~oibaf~j amd64 Userspace interface to nouveau-specific kernel DRM services – runtime
ii libdrm-nouveau2:i386 2.4.115+git2307210500.cc8c22~oibaf~j i386 Userspace interface to nouveau-specific kernel DRM services – runtime
ii libdrm-radeon1:amd64 2.4.115+git2307210500.cc8c22~oibaf~j amd64 Userspace interface to radeon-specific kernel DRM services – runtime
ii libdrm-radeon1:i386 2.4.115+git2307210500.cc8c22~oibaf~j i386 Userspace interface to radeon-specific kernel DRM services – runtime
ii libdrm2:amd64 2.4.115+git2307210500.cc8c22~oibaf~j amd64 Userspace interface to kernel DRM services – runtime
ii libdrm2:i386 2.4.115+git2307210500.cc8c22~oibaf~j i386 Userspace interface to kernel DRM services – runtime
ii libegl-mesa0:amd64 23.3~git2307220600.5cca11~oibaf~j amd64 free implementation of the EGL API – Mesa vendor library
ii libegl-mesa0:i386 23.3~git2307220600.5cca11~oibaf~j i386 free implementation of the EGL API – Mesa vendor library
ii libgbm1:amd64 23.3~git2307220600.5cca11~oibaf~j amd64 generic buffer management API – runtime
ii libgbm1:i386 23.3~git2307220600.5cca11~oibaf~j i386 generic buffer management API – runtime
ii libgl1-mesa-dri:amd64 23.3~git2307220600.5cca11~oibaf~j amd64 free implementation of the OpenGL API – DRI modules
ii libgl1-mesa-dri:i386 23.3~git2307220600.5cca11~oibaf~j i386 free implementation of the OpenGL API – DRI modules
ii libglapi-mesa:amd64 23.3~git2307220600.5cca11~oibaf~j amd64 free implementation of the GL API – shared library
ii libglapi-mesa:i386 23.3~git2307220600.5cca11~oibaf~j i386 free implementation of the GL API – shared library
ii libglx-mesa0:amd64 23.3~git2307220600.5cca11~oibaf~j amd64 free implementation of the OpenGL API – GLX vendor library
ii libglx-mesa0:i386 23.3~git2307220600.5cca11~oibaf~j i386 free implementation of the OpenGL API – GLX vendor library
ii libvdpau1:amd64 1.5-1~oibaf~j amd64 Video Decode and Presentation API for Unix (libraries)
ii libxatracker2:amd64 23.3~git2307220600.5cca11~oibaf~j amd64 X acceleration library – runtime
ii mesa-va-drivers:amd64 23.3~git2307220600.5cca11~oibaf~j amd64 Mesa VA-API video acceleration drivers
ii mesa-va-drivers:i386 23.3~git2307220600.5cca11~oibaf~j i386 Mesa VA-API video acceleration drivers
ii mesa-vdpau-drivers:amd64 23.3~git2307220600.5cca11~oibaf~j amd64 Mesa VDPAU video acceleration drivers
ii mesa-vulkan-drivers:amd64 23.3~git2307220600.5cca11~oibaf~j amd64 Mesa Vulkan graphics drivers
ii mesa-vulkan-drivers:i386 23.3~git2307220600.5cca11~oibaf~j i386 Mesa Vulkan graphics drivers

but I have an update pending. It is bleeding edge.

Got info that s2idle.prefer_microsoft_guid=1 is no longer supported and ignored. So I guess my suspend success was just a coincidence of having the HDMI cable unplugged in parallel :see_no_evil:

I’m currently slowly working on reverting my VFIO setup to be switchable and not static - so my NVIDIA options are currently limited.

But when I just loaded the kernel without that VFIO stuff as of

root@LEGION5PRO:~# cat /proc/cmdline
BOOT_IMAGE=/@/boot/vmlinuz-6.4.3-060403-generic root=UUID=a2ec4268-40ad-400d-b714-1f5cd394b39e ro rootflags=subvol=@ amd_iommu=pgtbl_v1 iommu=pt vfio_iommu_type1.allow_unsafe_interrupts=1 kvm.ignore_msrs=1 vfio-pci.ids=1022:14da,1022:14db,1022:14db,10de:2860,10de:22bd default_hugepagesz=1G hugepagesz=1G hugepages=8 pcie_aspm=force acpi=copy_dsdt

by booting the same kernel with just


linux /@/boot/vmlinuz-6.4.3-060403-generic root=UUID=a2ec4268-40ad-400d-b714-1f5cd394b39e ro rootflags=subvol=@ pcie_aspm=force acpi=copy_dsdt
initrd /@/boot/initrd.img-6.4.3-060403-generic

I noticed a flicker and garbage on the screen which looked like yours!

Additionally, I saw a slow boot regarding problems with usb 5.x devices:

dmesg.2.gz:[ 1.880509] kernel: usb 5-1.1: new high-speed USB device number 3 using xhci_hcd
dmesg.2.gz:[ 7.032525] kernel: usb 5-1.1: device descriptor read/64, error -110
dmesg.2.gz:[ 22.640549] kernel: usb 5-1.1: device descriptor read/64, error -110
dmesg.2.gz:[ 22.828530] kernel: usb 5-1.1: new high-speed USB device number 4 using xhci_hcd
dmesg.2.gz:[ 28.016567] kernel: usb 5-1.1: device descriptor read/64, error -110
dmesg.2.gz:[ 43.632569] kernel: usb 5-1.1: device descriptor read/64, error -110
dmesg.2.gz:[ 43.740955] kernel: usb 5-1-port1: attempt power cycle
dmesg.2.gz:[ 44.344527] kernel: usb 5-1.1: new high-speed USB device number 5 using xhci_hcd
dmesg.2.gz:[ 55.024522] kernel: usb 5-1.1: device not accepting address 5, error -62
dmesg.2.gz:[ 55.104533] kernel: usb 5-1.1: new high-speed USB device number 6 using xhci_hcd
dmesg.2.gz:[ 65.776524] kernel: usb 5-1.1: device not accepting address 6, error -62
dmesg.2.gz:[ 65.778439] kernel: usb 5-1-port1: unable to enumerate USB device

I then booted again my VFIO enabled kernel and again saw garbage on the screen. I powered off the system and currently have not seen the usb issues in dmesg and it works fine currently without screen garbage. The last thing I dealt with was AFAIR installing, configuring and uninstalling laptop-mode-tools, as I saw the issues starting from then, without having the system powered off (warm boot only).

  1. Do you see USB messages in your dmesg, too, when the issues occur?
  2. Did you enable overclocking in BIOS for CPU and/or GPU?
  3. Do you have a powertop --autotune set dealing with power savings per device?
  4. You seem to have “quiet splash” set in GRUB - you should probably consider to remove that in order to see the boot messages early to find a pattern?!