My laptop does not boot with Latest Manjaro Release or Latest Kernel or LTS Kernel . I am using Linux as my primary distro for more than 3 years as a temporary measure after lot of tries I am now using 5.13.12-1-MANJARO . I tried endeavourOS, Manjaro ,ubuntu etc none of them boot up successfully . I was not even able to access terminal tty . the only success I had was when I blacklisted both proprietary & opensource gpu drivers (similar to safe graphics mode) and I was able to login but system does not recognize any monitor and had lot of other bugs due to lack of drivers . After many tries I downloaded old release from manjaro mirrors (2021 release) and blacklisted kernel from pacman and I am using it now . I tried booting up with other kernels lts,linux latest etc but it always failed only success I have is with 5.13.12-1-MANJARO
After digging around for sometime I found that this is due to page fault with amdgpu . The exact error details are here
These logs are from latest boot after latest updates . this has been going around from about 1-1.5 months . I thought next update would fix this but latest update of lts on 31-oct did not .
Pls help me out !! are there any other official support channels I should contact ??
Hii there !! I tried posting links using HTML and markdown format but it did not allow me to post. it after looking at this post https://forum.manjaro.org/t/howto-post-screenshots-and-links/16378 I followed mentioned instructions .
P.S Even now it is not allowing unless links are formatted between back quotes
hii @stephane , I have already updated UEFI to latest version but it no change is observed I even tried old updates .
I tried linux 6.1.0 kernel . here is it’s stracktrace
[drm] Detected VRAM RAM=4096M, BAR=256M
Nov 04 01:39:58.930900 Monarch kernel: [drm] RAM width 128bits GDDR5
Nov 04 01:39:58.930940 Monarch kernel: [drm] amdgpu: 4096M of VRAM memory ready
Nov 04 01:39:58.930981 Monarch kernel: [drm] amdgpu: 7841M of GTT memory ready.
Nov 04 01:39:58.931023 Monarch kernel: [drm] GART: num cpu pages 65536, num gpu pages 65536
Nov 04 01:39:58.931062 Monarch kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
Nov 04 01:39:58.931101 Monarch kernel: [drm] Chained IB support enabled!
Nov 04 01:39:58.960289 Monarch kernel: audit: type=1130 audit(1667506195.498:4): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-backlight@leds:asus::kbd_backlight comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 04 01:39:58.960383 Monarch kernel: audit: type=1130 audit(1667506195.574:5): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 04 01:39:58.960436 Monarch kernel: amdgpu: hwmgr_sw_init smu backed is polaris10_smu
Nov 04 01:39:58.960478 Monarch kernel: [drm] Found UVD firmware Version: 1.130 Family ID: 16
Nov 04 01:39:58.960516 Monarch kernel: [drm] Found VCE firmware Version: 53.26 Binary ID: 3
Nov 04 01:39:58.960552 Monarch kernel: audit: type=1130 audit(1667506196.061:6): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-fsck@dev-disk-by\x2duuid-2131\x2d271E comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Nov 04 01:39:58.960593 Monarch kernel: amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring gfx test failed (-110)
Nov 04 01:39:58.960917 Monarch kernel: [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <gfx_v8_0> failed -110
Nov 04 01:39:58.960964 Monarch kernel: amdgpu 0000:01:00.0: amdgpu: amdgpu_device_ip_init failed
Nov 04 01:39:58.961250 Monarch kernel: amdgpu 0000:01:00.0: amdgpu: Fatal error during GPU init
Nov 04 01:39:58.961518 Monarch kernel: amdgpu 0000:01:00.0: amdgpu: amdgpu: finishing device.
Nov 04 01:39:58.961816 Monarch kernel: amdgpu: probe of 0000:01:00.0 failed with error -110
Nov 04 01:39:58.961858 Monarch kernel: BUG: kernel NULL pointer dereference, address: 0000000000000090
Nov 04 01:39:58.961967 Monarch kernel: #PF: supervisor write access in kernel mode
Nov 04 01:39:58.962103 Monarch kernel: #PF: error_code(0x0002) - not-present page
Nov 04 01:39:58.962222 Monarch kernel: PGD 0 P4D 0
Nov 04 01:39:59.129350 Monarch kernel: Oops: 0002 [#1] PREEMPT SMP NOPTI
Nov 04 01:39:59.129484 Monarch kernel: CPU: 4 PID: 317 Comm: systemd-udevd Tainted: G OE 6.1.0-1-MANJARO #1 dc22f7a720c32acf7691c378ceb7ed41eb713b14
Nov 04 01:39:59.129544 Monarch kernel: Hardware name: ASUSTeK COMPUTER INC. TUF Gaming FX705DY_FX705DY/FX705DY, BIOS FX705DY.315 03/09/2020
Nov 04 01:39:59.129592 Monarch kernel: RIP: 0010:drm_sched_fini+0x84/0xa0 [gpu_sched]
Nov 04 01:39:59.129638 Monarch kernel: Code: 02 c5 df c7 c6 85 8c 01 00 00 00 5b 5d 41 5c 41 5d e9 74 e3 d3 c8 4c 8d 63 f0 4c 89 e7 e8 f4 d3 96 c8 48 8b 03 48 39 d8 74 0f <c6> 80 90 00 00 00 01 48 8b 00 48 39 d8 75 f1 4c 89 e7 e8 65 d4 96
Nov 04 01:39:59.129684 Monarch kernel: RSP: 0018:ffffb30f405f3b08 EFLAGS: 00010217
Nov 04 01:39:59.129728 Monarch kernel: RAX: 0000000000000000 RBX: ffff8e6d06049ae8 RCX: ffff8e6d0e254000
Nov 04 01:39:59.129779 Monarch kernel: RDX: 0000000000000001 RSI: ffff8e6d0e254028 RDI: ffff8e6d06049ad8
Nov 04 01:39:59.129824 Monarch kernel: RBP: ffff8e6d06049a40 R08: ffffffff89c86ae1 R09: 0000000000000010
Nov 04 01:39:59.129867 Monarch kernel: R10: 0000000000000021 R11: ffff8e6d11137230 R12: ffff8e6d06049ad8
Nov 04 01:39:59.129912 Monarch kernel: R13: ffff8e6d06049a48 R14: ffff8e6d06046618 R15: ffffb30f405f3df0
Nov 04 01:39:59.129956 Monarch kernel: FS: 00007f5efeaeb200(0000) GS:ffff8e701ef00000(0000) knlGS:0000000000000000
Nov 04 01:39:59.130000 Monarch kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 04 01:39:59.130044 Monarch kernel: CR2: 0000000000000090 CR3: 0000000106834000 CR4: 00000000003506e0
Nov 04 01:39:59.130088 Monarch kernel: Call Trace:
Nov 04 01:39:59.130132 Monarch kernel: <TASK>
Nov 04 01:39:59.130187 Monarch kernel: amdgpu_fence_driver_sw_fini+0xc8/0xd0 [amdgpu cface1340c4651f89bc6225f605dac1195c103af]
Nov 04 01:39:59.130237 Monarch kernel: amdgpu_device_fini_sw+0x33/0x3f0 [amdgpu cface1340c4651f89bc6225f605dac1195c103af]
Nov 04 01:39:59.130282 Monarch kernel: amdgpu_driver_release_kms+0x16/0x30 [amdgpu cface1340c4651f89bc6225f605dac1195c103af]
Nov 04 01:39:59.130325 Monarch kernel: devm_drm_dev_init_release+0x49/0x70
Nov 04 01:39:59.130373 Monarch kernel: release_nodes+0x40/0xb0
Nov 04 01:39:59.130413 Monarch kernel: devres_release_all+0x8c/0xc0
Nov 04 01:39:59.130447 Monarch kernel: device_unbind_cleanup+0xe/0x70
Nov 04 01:39:59.130495 Monarch kernel: really_probe+0x242/0x380
Nov 04 01:39:59.130538 Monarch kernel: ? pm_runtime_barrier+0x54/0x90
Nov 04 01:39:59.130594 Monarch kernel: __driver_probe_device+0x78/0x170
Nov 04 01:39:59.130660 Monarch kernel: driver_probe_device+0x1f/0x90
Nov 04 01:39:59.130715 Monarch kernel: __driver_attach+0xd5/0x1d0
Nov 04 01:39:59.130764 Monarch kernel: ? __device_attach_driver+0x110/0x110
Nov 04 01:39:59.130812 Monarch kernel: bus_for_each_dev+0x8b/0xd0
Nov 04 01:39:59.130883 Monarch kernel: bus_add_driver+0x1b2/0x200
Nov 04 01:39:59.130954 Monarch kernel: driver_register+0x8d/0xe0
Nov 04 01:39:59.131019 Monarch kernel: ? 0xffffffffc184c000
Nov 04 01:39:59.131081 Monarch kernel: do_one_initcall+0x5d/0x220
Nov 04 01:39:59.131148 Monarch kernel: do_init_module+0x4a/0x1e0
Nov 04 01:39:59.131224 Monarch kernel: __do_sys_init_module+0x17f/0x1b0
Nov 04 01:39:59.131293 Monarch kernel: do_syscall_64+0x5f/0x90
Nov 04 01:39:59.131359 Monarch kernel: ? syscall_exit_to_user_mode+0x1b/0x40
Nov 04 01:39:59.131428 Monarch kernel: ? do_syscall_64+0x6b/0x90
Nov 04 01:39:59.131498 Monarch kernel: ? syscall_exit_to_user_mode+0x1b/0x40
Nov 04 01:39:59.131579 Monarch kernel: ? do_syscall_64+0x6b/0x90
Nov 04 01:39:59.131659 Monarch kernel: entry_SYSCALL_64_after_hwframe+0x63/0xcd
Nov 04 01:39:59.131724 Monarch kernel: RIP: 0033:0x7f5efef21eae
This time error is slightly different . Instead of Bug: page fault in kernel 6.1 there is Bug: Null pointer dereference .
GPU init fails in all three kernels but it does panic in kernel 5.13 from my experience I think that in kernels <5.13 there was some sort of fallback mechanism when there is a error (proper error handling) but it is either removed or it does not work in kernels >= 5.15