Discrete AMD GPU freezes

I have a Lenovo laptop with Intel Core i3-6006U APU and a discrete AMD Sun XT HANAI GPU

System:
  Kernel: 5.16.2-1-MANJARO x86_64 bits: 64 compiler: gcc v: 11.1.0
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-5.16-x86_64
    root=UUID=9ea71958-a5ca-4f8a-bb1a-d12b6694393c rw rootflags=subvol=@ quiet
    apparmor=1 security=apparmor udev.log_priority=3
  Desktop: KDE Plasma 5.23.5 tk: Qt 5.15.2 wm: kwin_x11 vt: 1 dm: SDDM
    Distro: Manjaro Linux base: Arch Linux
Machine:
  Type: Laptop System: Dell product: Inspiron 15-3567 v: N/A
    serial: <superuser required> Chassis: type: 9 serial: <superuser required>
  Mobo: Dell model: 0JTTXY v: A00 serial: <superuser required> UEFI: Dell
    v: 2.13.0 date: 08/13/2020
Battery:
  ID-1: BAT0 charge: 29.5 Wh (100.0%) condition: 29.5/41.4 Wh (71.2%)
    volts: 16.8 min: 14.8 model: SMP DELL VN3N047 type: Li-ion serial: <filter>
    status: Full
Memory:
  RAM: total: 7.63 GiB used: 2.52 GiB (33.1%)
  RAM Report:
    permissions: Unable to run dmidecode. Root privileges required.
CPU:
  Info: model: Intel Core i3-6006U bits: 64 type: MT MCP arch: Skylake
    family: 6 model-id: 0x4E (78) stepping: 3 microcode: 0xEA
  Topology: cpus: 1x cores: 2 tpc: 2 threads: 4 smt: enabled cache:
    L1: 128 KiB desc: d-2x32 KiB; i-2x32 KiB L2: 512 KiB desc: 2x256 KiB
    L3: 3 MiB desc: 1x3 MiB
  Speed (MHz): avg: 810 high: 1742 min/max: 400/2000 scaling:
    driver: intel_pstate governor: powersave cores: 1: 1742 2: 500 3: 500 4: 500
    bogomips: 16006
  Flags: 3dnowprefetch abm acpi adx aes aperfmperf apic arat arch_perfmon
    art avx avx2 bmi1 bmi2 bts clflush clflushopt cmov constant_tsc cpuid
    cpuid_fault cx16 cx8 de ds_cpl dtes64 dtherm dts epb ept ept_ad erms est
    f16c flexpriority flush_l1d fma fpu fsgsbase fxsr ht hwp hwp_act_window
    hwp_epp hwp_notify ibpb ibrs intel_pt invpcid invpcid_single lahf_lm lm
    mca mce md_clear mmx monitor movbe mpx msr mtrr nonstop_tsc nopl nx pae
    pat pbe pcid pclmulqdq pdcm pdpe1gb pebs pge pln pni popcnt pse pse36 pti
    pts rdrand rdseed rdtscp rep_good sdbg sep smap smep ss ssbd sse sse2
    sse4_1 sse4_2 ssse3 stibp syscall tm tm2 tpr_shadow tsc tsc_adjust
    tsc_deadline_timer vme vmx vnmi vpid x2apic xgetbv1 xsave xsavec xsaveopt
    xsaves xtopology xtpr
  Vulnerabilities:
  Type: itlb_multihit status: KVM: VMX disabled
  Type: l1tf
    mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
  Type: mds mitigation: Clear CPU buffers; SMT vulnerable
  Type: meltdown mitigation: PTI
  Type: spec_store_bypass
    mitigation: Speculative Store Bypass disabled via prctl
  Type: spectre_v1
    mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional,
    IBRS_FW, STIBP: conditional, RSB filling
  Type: srbds mitigation: Microcode
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: Intel Skylake GT2 [HD Graphics 520] vendor: Dell driver: i915
    v: kernel bus-ID: 00:02.0 chip-ID: 8086:1916 class-ID: 0300
  Device-2: AMD Sun XT [Radeon HD 8670A/8670M/8690M / R5 M330 / M430 /
    Radeon 520 Mobile]
    vendor: Dell driver: radeon v: kernel alternate: amdgpu bus-ID: 01:00.0
    chip-ID: 1002:6660 class-ID: 0380
  Device-3: Microdia Integrated_Webcam_HD type: USB driver: uvcvideo
    bus-ID: 1-5:3 chip-ID: 0c45:6a05 class-ID: 0e02
  Display: x11 server: X.org 1.21.1.3 compositor: kwin_x11 driver:
    loaded: ati,modesetting unloaded: radeon alternate: fbdev,vesa
    resolution: <missing: xdpyinfo>
  OpenGL: renderer: Mesa Intel HD Graphics 520 (SKL GT2) v: 4.6 Mesa 21.3.4
    direct render: Yes
Audio:
  Device-1: Intel Sunrise Point-LP HD Audio vendor: Dell driver: snd_hda_intel
    v: kernel alternate: snd_soc_skl bus-ID: 00:1f.3 chip-ID: 8086:9d70
    class-ID: 0403
  Sound Server-1: ALSA v: k5.16.2-1-MANJARO running: yes
  Sound Server-2: sndio v: N/A running: no
  Sound Server-3: JACK v: 1.9.20 running: no
  Sound Server-4: PulseAudio v: 15.0 running: no
  Sound Server-5: PipeWire v: 0.3.43 running: yes
Network:
  Device-1: Qualcomm Atheros QCA9565 / AR9565 Wireless Network Adapter
    vendor: Dell Vostro 3470 driver: ath9k v: kernel bus-ID: 02:00.0
    chip-ID: 168c:0036 class-ID: 0280
  IF: wlp2s0 state: up mac: <filter>
  IP v4: <filter> type: dynamic noprefixroute scope: global
    broadcast: <filter>
  IP v6: <filter> type: noprefixroute scope: link
  Device-2: Realtek RTL810xE PCI Express Fast Ethernet vendor: Dell
    driver: r8169 v: kernel port: d000 bus-ID: 03:00.0 chip-ID: 10ec:8136
    class-ID: 0200
  IF: enp3s0 state: down mac: <filter>
  WAN IP: <filter>
Bluetooth:
  Device-1: Qualcomm Atheros type: USB driver: btusb v: 0.8 bus-ID: 1-8:6
    chip-ID: 0cf3:e005 class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 1 state: down bt-service: enabled,running
    rfk-block: hardware: no software: yes address: see --recommends
Logical:
  Message: No logical block device data found.
RAID:
  Message: No RAID data found.
Drives:
  Local Storage: total: 223.57 GiB used: 194.15 GiB (86.8%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/sda maj-min: 8:0 vendor: Kingston model: SA400S37240G
    size: 223.57 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    type: SSD serial: <filter> rev: 0003 scheme: GPT
  Optical-1: /dev/sr0 vendor: HL-DT-ST model: DVD+-RW GU90N rev: A1C2
    dev-links: cdrom
  Features: speed: 24 multisession: yes audio: yes dvd: yes
    rw: cd-r,cd-rw,dvd-r,dvd-ram state: running
Partition:
  ID-1: / raw-size: 222.98 GiB size: 222.98 GiB (100.00%)
    used: 194.15 GiB (87.1%) fs: btrfs dev: /dev/sda2 maj-min: 8:2 label: root
    uuid: 9ea71958-a5ca-4f8a-bb1a-d12b6694393c
  ID-2: /boot/efi raw-size: 600 MiB size: 598.8 MiB (99.80%)
    used: 568 KiB (0.1%) fs: vfat dev: /dev/sda1 maj-min: 8:1 label: BOOT
    uuid: 7CCD-197F
  ID-3: /home raw-size: 222.98 GiB size: 222.98 GiB (100.00%)
    used: 194.15 GiB (87.1%) fs: btrfs dev: /dev/sda2 maj-min: 8:2 label: root
    uuid: 9ea71958-a5ca-4f8a-bb1a-d12b6694393c
  ID-4: /swap raw-size: 222.98 GiB size: 222.98 GiB (100.00%)
    used: 194.15 GiB (87.1%) fs: btrfs dev: /dev/sda2 maj-min: 8:2 label: root
    uuid: 9ea71958-a5ca-4f8a-bb1a-d12b6694393c
  ID-5: /var/cache raw-size: 222.98 GiB size: 222.98 GiB (100.00%)
    used: 194.15 GiB (87.1%) fs: btrfs dev: /dev/sda2 maj-min: 8:2 label: root
    uuid: 9ea71958-a5ca-4f8a-bb1a-d12b6694393c
  ID-6: /var/log raw-size: 222.98 GiB size: 222.98 GiB (100.00%)
    used: 194.15 GiB (87.1%) fs: btrfs dev: /dev/sda2 maj-min: 8:2 label: root
    uuid: 9ea71958-a5ca-4f8a-bb1a-d12b6694393c
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default)
  ID-1: swap-1 type: file size: 4 GiB used: 0 KiB (0.0%) priority: -2
    file: /swap/swapfile
  ID-2: swap-2 type: zram size: 16 GiB used: 0 KiB (0.0%) priority: 15
    dev: /dev/zram0
Unmounted:
  Message: No unmounted partitions found.
USB:
  Hub-1: 1-0:1 info: Hi-speed hub with single TT ports: 12 rev: 2.0
    speed: 480 Mb/s chip-ID: 1d6b:0002 class-ID: 0900
  Device-1: 1-3:2 info: USB OPTICAL MOUSE type: Mouse
    driver: hid-generic,usbhid interfaces: 1 rev: 1.1 speed: 1.5 Mb/s
    power: 100mA chip-ID: 0000:3825 class-ID: 0301
  Device-2: 1-5:3 info: Microdia Integrated_Webcam_HD type: Video
    driver: uvcvideo interfaces: 2 rev: 2.0 speed: 480 Mb/s power: 500mA
    chip-ID: 0c45:6a05 class-ID: 0e02
  Device-3: 1-6:4 info: Realtek RTS5129 Card Reader Controller
    type: <vendor specific> driver: rtsx_usb,rtsx_usb_ms,rtsx_usb_sdmmc
    interfaces: 1 rev: 2.0 speed: 480 Mb/s power: 500mA chip-ID: 0bda:0129
    class-ID: ff00 serial: <filter>
  Device-4: 1-8:6 info: Qualcomm Atheros type: Bluetooth driver: btusb
    interfaces: 2 rev: 1.1 speed: 12 Mb/s power: 100mA chip-ID: 0cf3:e005
    class-ID: e001
  Hub-2: 2-0:1 info: Super-speed hub ports: 6 rev: 3.0 speed: 5 Gb/s
    chip-ID: 1d6b:0003 class-ID: 0900
Sensors:
  System Temperatures: cpu: 41.0 C pch: 42.0 C mobo: 42.0 C gpu: radeon
    temp: 33.0 C
  Fan Speeds (RPM): cpu: 0
Info:
  Processes: 231 Uptime: 13m wakeups: 1 Init: systemd v: 250 tool: systemctl
  Compilers: gcc: 11.1.0 clang: 13.0.0 Packages: 1706 apt: 0 pacman: 1679
  lib: 507 flatpak: 27 Shell: Zsh v: 5.8 default: Bash v: 5.1.16
  running-in: konsole inxi: 3.3.12

I recently installed Manjaro on it and tried running Steam a few times but I got complete system freezes on login.
I came to the conclusion it must be my discrete card after testing with glmark2.
I ran glmark2 twice, the first time it froze at the start of the test, the second time I got an error.

  ~  DRI_PRIME=1 glmark2                                                                       ✔  5m 33s 
=======================================================
    glmark2 2021.12
=======================================================
    OpenGL Information
    GL_VENDOR:     AMD
    GL_RENDERER:   AMD HAINAN (DRM 2.50.0, 5.16.2-1-MANJARO, LLVM 13.0.0)
    GL_VERSION:    4.5 (Compatibility Profile) Mesa 21.3.4
=======================================================
[build] use-vbo=false:^C

  ~  DRI_PRIME=1 glmark2                                                                   INT ✘  2m 45s 
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000100000000
radeon: Failed to deallocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    va        : 0x100000000
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000100000000
radeon: Failed to deallocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    va        : 0x100000000
radeonsi: Failed to create a context.
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000100000000
radeon: Failed to deallocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    va        : 0x100000000
radeon: Failed to allocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    alignment : 4096 bytes
radeon:    domains   : 4
radeon:    va        : 0x0000000100000000
radeon: Failed to deallocate virtual address for buffer:
radeon:    size      : 65536 bytes
radeon:    va        : 0x100000000
radeonsi: Failed to create a context.
Error: glXCreateNewContext failed
Error: CanvasGeneric: Invalid EGL state
Error: main: Could not initialize canvas
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  152 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  40
  Current serial number in output stream:  41

My intel GPU runs fine so I tried disabling the discrete AMD GPU with “options vfio-pci ids=1002:6660 disable_vga=1” in /etc/modprobe.d/vfio-pci.conf but running Steam still crashes the system.
Only “DRI_PRIME=0 steam” works.
Is there another way to disable my discrete GPU so games and apps won’t use it?

Also, it doesn’t matter if I have “options vfio-pci ids=1002:6660 disable_vga=1” in /etc/modprobe.d/vfio-pci.conf my “cat /sys/kernel/debug/vgaswitcheroo/switch” output is always:

0:IGD:+:Pwr:0000:00:02.0
1:DIS: :DynOff:0000:01:00.0

Before doing anything else, make sure that the AMDGPU module is loading, see the solution on this thread and then go from there: GDM does not start until I switch tty,

I changed MODULES="crc32c-intel" to MODULES=(amdgpu) in /etc/mkinitcpio.conf and sudo mkinitcpio -P but haven’t noticed any difference. I still get a freeze when starting a game or steam.

You need both modules if you have both graphics cards.

Added both modules again no difference, what pissed me off is that the last crash deleted my configs/settings for a few of my apps like loosing all qbittorrent torrents and settings, megasync login etc.

Sounds like you have some weird thing going on there…

Yeah, I don’t understand the reason for that. Is it because of zram, swapfile, btrfs? I intentionally quit those apps so this doesn’t happen.