System randomly hangs up during booting

Hi all, I have a laptop with two graphics cards: intel/nvidia. I removed “quiet” flag and last log entry was “reached target graphical interface”.

inxi -F
System:
  Host: hp Kernel: 5.15.55-1-MANJARO arch: x86_64 bits: 64
    Desktop: KDE Plasma v: 5.24.6 Distro: Manjaro Linux
Machine:
  Type: Laptop System: HP product: HP Pavilion Gaming Laptop 17-cd1xxx
    v: Type1ProductConfigId serial: <superuser required>
  Mobo: HP model: 8745 v: 03.45 serial: <superuser required> UEFI: Insyde
    v: F.35 date: 04/07/2022
Battery:
  ID-1: BAT1 charge: 43.8 Wh (99.8%) condition: 43.9/52.5 Wh (83.5%)
CPU:
  Info: quad core model: Intel Core i5-10300H bits: 64 type: MT MCP cache:
    L2: 1024 KiB
  Speed (MHz): avg: 3476 min/max: 800/4500 cores: 1: 3406 2: 3438 3: 3467
    4: 3500 5: 3500 6: 3500 7: 3500 8: 3500
Graphics:
  Device-1: Intel CometLake-H GT2 [UHD Graphics] driver: i915 v: kernel
  Device-2: NVIDIA TU117M driver: nvidia v: 515.57
  Device-3: Lite-On HP Wide Vision HD Camera type: USB driver: uvcvideo
  Display: x11 server: XOrg v: 21.1.4 with: Xwayland v: 22.1.3 driver: X:
    loaded: modesetting,nvidia gpu: i915 resolution: 1920x1080~144Hz
  OpenGL: renderer: Mesa Intel UHD Graphics (CML GT2) v: 4.6 Mesa 22.1.3
Audio:
  Device-1: Intel Comet Lake PCH cAVS driver: snd_hda_intel
  Device-2: NVIDIA driver: snd_hda_intel
  Sound Server-1: ALSA v: k5.15.55-1-MANJARO running: yes
  Sound Server-2: PulseAudio v: 16.1 running: yes
Network:
  Device-1: Intel Comet Lake PCH CNVi WiFi driver: iwlwifi
  IF: wlo1 state: up mac: 14:18:c3:2e:40:b9
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    driver: r8169
  IF: eno1 state: down mac: b0:22:7a:f6:45:79
Bluetooth:
  Device-1: Intel AX201 Bluetooth type: USB driver: btusb
  Report: rfkill ID: hci0 state: up address: see --recommends
RAID:
  Hardware-1: Intel 82801 Mobile SATA Controller [RAID mode] driver: ahci
Drives:
  Local Storage: total: 953.87 GiB used: 134.77 GiB (14.1%)
  ID-1: /dev/nvme0n1 vendor: A-Data model: SX8100NP size: 953.87 GiB
Partition:
  ID-1: / size: 920.82 GiB used: 134.76 GiB (14.6%) fs: ext4
    dev: /dev/nvme0n1p2
  ID-2: /boot/efi size: 299.4 MiB used: 312 KiB (0.1%) fs: vfat
    dev: /dev/nvme0n1p1
Swap:
  ID-1: swap-1 type: partition size: 16.98 GiB used: 0 KiB (0.0%)
    dev: /dev/nvme0n1p3
Sensors:
  System Temperatures: cpu: 54.0 C pch: 48.0 C mobo: N/A
  Fan Speeds (RPM): N/A
Info:
  Processes: 301 Uptime: 1h 8m Memory: 15.43 GiB used: 5.9 GiB (38.2%)
  Shell: Bash inxi: 3.3.19

I’ve made image of comparison two log files, but I can’t attach it here)))

Log files pretty much same until “glamor X acceleration” section, /var/log/Xorg.0.log.old interrupts on line

“ABI class: XOrg ANSI C Emulation, version 0.4”

whereas /var/log/Xorg.0.log continue with line

“modeset(0): glamor X acceleration enabled on Mesa Intel(R) UHD Graphics (CML GT2)” and so on

provide FORMATTED output from:
mhwd -l && mhwd -li
find /etc/X11/ -name "*.conf"
try switching kernels
and when it happens again enter into tty: ctrl+alt+f2 or f1-f6, enter your username/password, and type:
startx
if it doesnt work it creates a log in your home directory and post FORMATTED output from that log

Спойлер
> 0000:01:00.0 (0300:10de:1f99) Display controller nVidia Corporation:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
video-hybrid-intel-nvidia-prime            2021.12.18               false            PCI
video-hybrid-intel-nvidia-470xx-prime            2021.12.18               false            PCI
          video-nvidia            2021.12.18               false            PCI
    video-nvidia-470xx            2021.12.18               false            PCI
           video-linux            2018.05.04                true            PCI
     video-modesetting            2020.01.13                true            PCI
            video-vesa            2017.03.12                true            PCI


> 0000:07:00.0 (0200:10ec:8168) Network controller Realtek Semiconductor Co., Ltd.:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
         network-r8168            2016.04.20                true            PCI


> 0000:00:02.0 (0300:8086:9bc4) Display controller Intel Corporation:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
video-hybrid-intel-nvidia-prime            2021.12.18               false            PCI
video-hybrid-intel-nvidia-470xx-prime            2021.12.18               false            PCI
           video-linux            2018.05.04                true            PCI
     video-modesetting            2020.01.13                true            PCI
            video-vesa            2017.03.12                true            PCI


> Installed PCI configs:
--------------------------------------------------------------------------------
                  NAME               VERSION          FREEDRIVER           TYPE
--------------------------------------------------------------------------------
video-hybrid-intel-nvidia-prime            2021.12.18               false            PCI
     video-modesetting            2020.01.13                true            PCI


Warning: No installed USB configs!

Also when it happens I can’t press any key, only hard reset helps me to reboot computer

find /etc/X11/ -name "*.conf"
/etc/X11/xorg.conf.d/00-keyboard.conf
/etc/X11/xorg.conf.d/30-touchpad.conf
/etc/X11/mhwd.d/nvidia.conf

the output looks ok…
so switch kernels - 5.18, 5.10 and try with them, try first with the 5.18, if it happens again try with the 5.10, if it also happens with it, hard reboot, and provide logs from the failed boot with:
journalctl -b-1 -p4 --no-pager

I’ve changed kernel to 5.10, because 5.18 broke my multimedia keys

journalctl -b-1 -p4 --no-pager
июл 23 16:44:51 hp kernel: Spectre V2 : WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks!
июл 23 16:44:51 hp kernel: MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html for more details.
июл 23 16:44:51 hp kernel:  #5 #6 #7
июл 23 16:44:51 hp kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
июл 23 16:44:51 hp kernel: [Firmware Bug]: Invalid critical threshold (0)
июл 23 16:44:51 hp kernel: hpet_acpi_add: no address or irqs in _CRS
июл 23 16:44:51 hp kernel: i8042: PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
июл 23 16:44:51 hp kernel: usb: port power management may be unreliable
июл 23 16:44:51 hp kernel: ACPI BIOS Error (bug): Could not resolve symbol [\DPPP], AE_NOT_FOUND (20200925/psargs-330)
июл 23 16:44:51 hp kernel: ACPI Error: Aborting method \_SB.IETM.IDSP due to previous error (AE_NOT_FOUND) (20200925/psparse-529)
июл 23 16:44:51 hp kernel: wmi_bus wmi_bus-PNP0C14:01: WQ data block query control method not found
июл 23 16:44:51 hp kernel: wmi_bus wmi_bus-PNP0C14:01: WQ data block query control method not found
июл 23 16:44:51 hp kernel: wmi_bus wmi_bus-PNP0C14:01: WQ data block query control method not found
июл 23 16:44:51 hp kernel: wmi_bus wmi_bus-PNP0C14:01: WQ data block query control method not found
июл 23 16:44:51 hp kernel: acpi PNP0C14:04: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:03)
июл 23 16:44:51 hp kernel: acpi PNP0C14:05: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:03)
июл 23 16:44:51 hp kernel: wmi_bus wmi_bus-PNP0C14:05: WQBJ data block query control method not found
июл 23 16:44:52 hp kernel: i2c_hid i2c-ELAN0710:01: supply vdd not found, using dummy regulator
июл 23 16:44:52 hp kernel: i2c_hid i2c-ELAN0710:01: supply vddl not found, using dummy regulator
июл 23 16:44:52 hp kernel: r8169 0000:07:00.0: can't disable ASPM; OS doesn't have ASPM control
июл 23 16:44:52 hp kernel: iwlwifi 0000:00:14.3: api flags index 2 larger than supported by driver
июл 23 16:44:52 hp kernel: nvidia: loading out-of-tree module taints kernel.
июл 23 16:44:52 hp kernel: nvidia: module license 'NVIDIA' taints kernel.
июл 23 16:44:52 hp kernel: Disabling lock debugging due to kernel taint
июл 23 16:44:52 hp kernel: 
июл 23 16:44:52 hp kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  515.57  Wed Jun 22 22:44:07 UTC 2022
июл 23 16:44:52 hp kernel: [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS
июл 23 16:44:52 hp kernel: thermal thermal_zone9: failed to read out thermal zone (-61)
июл 23 16:44:53 hp kernel: hp_wmi: query 0x4c returned error 0x6
июл 23 16:44:54 hp kernel: nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
июл 23 16:44:57 hp kernel: kauditd_printk_skb: 26 callbacks suppressed
июл 23 16:45:24 hp kernel: ACPI Error: Aborting method \_SB.PCI0.RTEN due to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
июл 23 16:45:24 hp kernel: ACPI Error: Aborting method \_SB.PCI0.PGON due to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
июл 23 16:45:24 hp kernel: ACPI Error: Aborting method \_SB.PCI0.PEG0.PG00._ON due to previous error (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529)
июл 23 16:45:24 hp kernel: acpi device:00: Failed to change power state to D0
июл 23 16:45:25 hp kernel: video LNXVIDEO:00: Cannot transition to power state D0 for parent in (unknown)
июл 23 16:45:25 hp kernel: nvidia 0000:01:00.0: can't change power state from D3cold to D0 (config space inaccessible)
июл 23 16:45:26 hp kernel: snd_hda_intel 0000:01:00.1: can't change power state from D3cold to D0 (config space inaccessible)

I reload my leptop about 20 times in a row and it happens again.

these may be the issue:

can't disable ASPM; OS doesn't have ASPM control
can't change power state from D3cold to D0

open this file:
kate /etc/default/grub
and in this line GRUB_CMDLINE_LINUX_DEFAULT inside the quotes"" add this parameter:
pcie_aspm=off
add it to existing ones, dont remove anything from there, then save the file and run this:
sudo update-grub
reboot and test

Dosn’t help

journalctl -b-1 -p4 --no-pager
июл 23 17:02:20 hp kernel: Spectre V2 : WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks!
июл 23 17:02:20 hp kernel: MMIO Stale Data CPU bug present and SMT on, data leak possible. See https://www.kernel.org/doc/html/latest/admin-guide/hw-vuln/processor_mmio_stale_data.html for more details.
июл 23 17:02:20 hp kernel:  #5 #6 #7
июл 23 17:02:20 hp kernel: ENERGY_PERF_BIAS: Set to 'normal', was 'performance'
июл 23 17:02:20 hp kernel: [Firmware Bug]: Invalid critical threshold (0)
июл 23 17:02:20 hp kernel: hpet_acpi_add: no address or irqs in _CRS
июл 23 17:02:20 hp kernel: i8042: PNP: PS/2 appears to have AUX port disabled, if this is incorrect please boot with i8042.nopnp
июл 23 17:02:20 hp kernel: usb: port power management may be unreliable
июл 23 17:02:20 hp kernel: ACPI BIOS Error (bug): Could not resolve symbol [\DPPP], AE_NOT_FOUND (20200925/psargs-330)
июл 23 17:02:20 hp kernel: ACPI Error: Aborting method \_SB.IETM.IDSP due to previous error (AE_NOT_FOUND) (20200925/psparse-529)
июл 23 17:02:20 hp kernel: wmi_bus wmi_bus-PNP0C14:01: WQ data block query control method not found
июл 23 17:02:20 hp kernel: wmi_bus wmi_bus-PNP0C14:01: WQ data block query control method not found
июл 23 17:02:20 hp kernel: wmi_bus wmi_bus-PNP0C14:01: WQ data block query control method not found
июл 23 17:02:20 hp kernel: wmi_bus wmi_bus-PNP0C14:01: WQ data block query control method not found
июл 23 17:02:20 hp kernel: acpi PNP0C14:04: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:03)
июл 23 17:02:20 hp kernel: acpi PNP0C14:05: duplicate WMI GUID 05901221-D566-11D1-B2F0-00A0C9062910 (first instance was on PNP0C14:03)
июл 23 17:02:20 hp kernel: wmi_bus wmi_bus-PNP0C14:05: WQBJ data block query control method not found
июл 23 17:02:20 hp kernel: i2c_hid i2c-ELAN0710:01: supply vdd not found, using dummy regulator
июл 23 17:02:20 hp kernel: i2c_hid i2c-ELAN0710:01: supply vddl not found, using dummy regulator
июл 23 17:02:21 hp kernel: iwlwifi 0000:00:14.3: api flags index 2 larger than supported by driver
июл 23 17:02:21 hp kernel: nvidia: loading out-of-tree module taints kernel.
июл 23 17:02:21 hp kernel: nvidia: module license 'NVIDIA' taints kernel.
июл 23 17:02:21 hp kernel: Disabling lock debugging due to kernel taint
июл 23 17:02:21 hp kernel: 
июл 23 17:02:21 hp kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  515.57  Wed Jun 22 22:44:07 UTC 2022
июл 23 17:02:21 hp kernel: [Firmware Bug]: ACPI(PEGP) defines _DOD but not _DOS
июл 23 17:02:21 hp kernel: thermal thermal_zone9: failed to read out thermal zone (-61)
июл 23 17:02:22 hp kernel: nvidia_uvm: module uses symbols from proprietary module nvidia, inheriting taint.
июл 23 17:02:22 hp kernel: hp_wmi: query 0x4c returned error 0x6
июл 23 17:02:26 hp kernel: kauditd_printk_skb: 26 callbacks suppressed

This time it hangs up after “Started Hostname Service”, it thinking about 10 seconds then die

now nothing in logs… so replace the pcie_aspm=off parameter with this one:
pci=nommconf
update grub, reboot

So, it seems this time problem is gone! Thank you!

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.