I'm close to becoming a crazy kangaroo. ( my PC locks up)

Greetings to everyone in the group, don’t worry about the title, I’m still sane, I think.

I’m looking for help with a problem I’ve been struggling with for a month and a few days. I would be very grateful if you could offer any ideas or suggestions that might help me find the root cause of the problem, which I’ll describe below:

The problem is as follows:

When using the computer during periods of heavy graphics usage, it creates what I believe to be a hard lock. The system freezes on a black screen, and I can’t turn off the PC except with the power supply key. Interestingly, this error doesn’t generate any log because it happens so quickly that the system doesn’t even have time to write a log.

This occurs in slightly more demanding games like Black Mesa, Garry’s Mod, CS 2, etc., but not in lighter games like Minecraft and VRchat.

I tried various troubleshooting steps and solutions. I configured the RAM to have the same frequencies and correct voltage, disabled CPO in the BIOS, and also disabled CPB for testing purposes. I checked the option to change the processor frequency, but the BIOS, due to the motherboard, has more limited options in that regard.

I’ve also ruled out temperature as a factor. I bought a new tower-type cooler and replaced the thermal paste, and the base temperature is now 30°C, which I think is fine for an AMD processor.

The power supply is a 500W bronze-rated unit, so I shouldn’t have any power issues.

That covers almost all the tests I’ve done. I’ll leave more details about my system here:

System:
  Kernel: 6.18.33-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 16.1.1
    clocksource: tsc avail: hpet,acpi_pm
    parameters: BOOT_IMAGE=/@/boot/vmlinuz-6.18-x86_64
    root=UUID=1d0e8082-edf6-427f-95c2-f32f197bd69c rw rootflags=subvol=@ quiet
    splash resume=UUID=57b33719-3ea6-4cf2-9a82-a06891afd86b
    udev.log_priority=3
  Desktop: KDE Plasma v: 6.6.5 tk: Qt v: N/A info: frameworks v: 6.26.0
    wm: kwin_wayland vt: 1 dm: SDDM Distro: Manjaro base: Arch Linux
Machine:
  Type: Desktop Mobo: AMD model: A520 serial: <superuser required>
    uuid: <superuser required> Firmware: UEFI vendor: American Megatrends LLC.
    v: 5.17 date: 12/12/2012
CPU:
  Info: model: AMD Ryzen 5 5600GT with Radeon Graphics bits: 64 type: MT MCP
    arch: Zen 3 gen: 3 level: v3 note: check built: 2021-22
    process: TSMC n7 (7nm) family: 0x19 (25) model-id: 0x50 (80) stepping: 0
    microcode: 0xA500012
  Topology: cpus: 1x dies: 1 clusters: 1 cores: 6 threads: 12 tpc: 2
    smt: enabled cache: L1: 384 KiB desc: d-6x32 KiB; i-6x32 KiB L2: 3 MiB
    desc: 6x512 KiB L3: 16 MiB desc: 1x16 MiB
  Speed (MHz): avg: 2391 min/max: 422/4669 boost: enabled scaling:
    driver: amd-pstate-epp governor: powersave cores: 1: 2391 2: 2391 3: 2391
    4: 2391 5: 2391 6: 2391 7: 2391 8: 2391 9: 2391 10: 2391 11: 2391 12: 2391
    bogomips: 86237
  Flags-basic: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3
    svm
Graphics:
  Device-1: Advanced Micro Devices [AMD/ATI] Cezanne [Radeon Vega Series /
    Radeon Mobile Series] driver: amdgpu v: kernel arch: GCN-5 code: Vega
    process: GF 14nm built: 2017-20 pcie: gen: 3 speed: 8 GT/s lanes: 16 ports:
    active: HDMI-A-1 empty: DP-1,HDMI-A-2 bus-ID: 04:00.0 chip-ID: 1002:1638
    class-ID: 0300 temp: 32.0 C
  Display: wayland server: X.org v: 1.21.1.22 with: Xwayland v: 24.1.11
    compositor: kwin_wayland driver: X: loaded: amdgpu unloaded: modesetting
    alternate: fbdev,vesa dri: radeonsi gpu: amdgpu display-ID: 0
  Monitor-1: HDMI-A-1 model: HDMI built: 2025 res: mode: 1920x1080 hz: 100
    scale: 100% (1) dpi: 92 gamma: 1.2 size: 527x296mm (20.75x11.65")
    diag: 604mm (23.8") ratio: 16:9 modes: max: 1920x1080 min: 640x480
  API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
    device: 1 drv: swrast gbm: drv: radeonsi surfaceless: drv: radeonsi wayland:
    drv: radeonsi x11: drv: radeonsi
  API: OpenGL v: 4.6 vendor: amd mesa v: 26.1.1-arch1.2 glx-v: 1.4
    direct-render: yes renderer: AMD Radeon Graphics (radeonsi renoir ACO DRM
    3.64 6.18.33-1-MANJARO) device-ID: 1002:1638 memory: 1.95 GiB unified: yes
    display-ID: :1.0
  API: Vulkan v: 1.4.350 layers: 7 device: 0 type: integrated-gpu name: AMD
    Radeon Graphics (RADV RENOIR) driver: mesa radv v: 26.1.1-arch1.2
    device-ID: 1002:1638 surfaces: N/A
  Info: Tools: api: clinfo, eglinfo, glxinfo, vulkaninfo
    de: kscreen-console,kscreen-doctor wl: wayland-info
    x11: xdpyinfo, xprop, xrandr
Audio:
  Device-1: Advanced Micro Devices [AMD/ATI] Renoir/Cezanne HDMI/DP Audio
    driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
    bus-ID: 04:00.1 chip-ID: 1002:1637 class-ID: 0403
  Device-2: Advanced Micro Devices [AMD] Ryzen HD Audio vendor: Realtek
    driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s lanes: 16
    bus-ID: 04:00.6 chip-ID: 1022:15e3 class-ID: 0403
  API: ALSA v: k6.18.33-1-MANJARO status: kernel-api with: aoss
    type: oss-emulator tools: alsactl,alsamixer,amixer
  Server-1: sndiod v: N/A status: off tools: aucat,midicat,sndioctl
  Server-2: JACK v: 1.9.22 status: off tools: N/A
  Server-3: PipeWire v: 1.6.5 status: active with: 1: pipewire-pulse
    status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
    tools: pactl,pw-cat,pw-cli,wpctl
Network:
  Device-1: Realtek RTL8111/8168/8211/8411 PCI Express Gigabit Ethernet
    driver: r8169 v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1 port: f000
    bus-ID: 03:00.0 chip-ID: 10ec:8168 class-ID: 0200
  IF: enp3s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-1: tailscale0 state: unknown speed: -1 duplex: full mac: N/A
  Info: services: NetworkManager,systemd-timesyncd
Bluetooth:
  Device-1: Cambridge Silicon Radio Bluetooth Dongle (HCI mode) driver: btusb
    v: 0.8 type: USB rev: 1.1 speed: 12 Mb/s lanes: 1 mode: 1.1 bus-ID: 1-6:4
    chip-ID: 0a12:0001 class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 1 state: up address: see --recommends
Drives:
  Local Storage: total: 223.57 GiB used: 81.42 GiB (36.4%)
  SMART Message: Unable to run smartctl. Root privileges required.
  ID-1: /dev/sda maj-min: 8:0 vendor: Mancer Reaper model: MCR-RPRF240
    size: 223.57 GiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
    tech: SSD serial: <filter> fw-rev: 111a scheme: GPT
Partition:
  ID-1: / raw-size: 214.48 GiB size: 214.48 GiB (100.00%)
    used: 81.42 GiB (38.0%) fs: btrfs dev: /dev/sda2 maj-min: 8:2
  ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
    used: 664 KiB (0.2%) fs: vfat dev: /dev/sda1 maj-min: 8:1
  ID-3: /home raw-size: 214.48 GiB size: 214.48 GiB (100.00%)
    used: 81.42 GiB (38.0%) fs: btrfs dev: /dev/sda2 maj-min: 8:2
  ID-4: /var/log raw-size: 214.48 GiB size: 214.48 GiB (100.00%)
    used: 81.42 GiB (38.0%) fs: btrfs dev: /dev/sda2 maj-min: 8:2
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default) zswap: yes
    compressor: zstd max-pool: 20%
  ID-1: swap-1 type: partition size: 8.8 GiB used: 0 KiB (0.0%) priority: -2
    dev: /dev/sda3 maj-min: 8:3
Sensors:
  System Temperatures: cpu: 37.9 C mobo: N/A gpu: amdgpu temp: 33.0 C
  Fan Speeds (rpm): N/A
Info:
  Memory: total: 16 GiB note: est. available: 13.53 GiB used: 3.44 GiB (25.4%)
  Processes: 321 Power: uptime: 37m states: freeze,mem,disk suspend: s2idle
    wakeups: 0 hibernate: platform avail: shutdown, reboot, suspend, test_resume
    image: 5.36 GiB services: org_kde_powerdevil, power-profiles-daemon,
    upowerd Init: systemd v: 260 default: graphical tool: systemctl
  Packages: 1394 pm: pacman pkgs: 1388 libs: 364 tools: pamac,yay pm: flatpak
    pkgs: 6 Compilers: gcc: 16.1.1 Shell: Zsh v: 5.9 running-in: konsole
    inxi: 3.3.40

Note: If this isn’t the correct channel, you can move it to a suitable one. I thought maybe you could because it’s not directly related to Manjaro. Nor to Linux, because it’s something that also happens with W10

:kangaroo:

jun 09 20:45:24 Mancer-Linux kwin_wayland_wrapper[1085]: > Warning:          Unsupported maximum keycode 709, clipping.
jun 09 20:45:24 Mancer-Linux kwin_wayland_wrapper[1085]: >                   X11 cannot support keycodes above 255.
jun 09 20:45:24 Mancer-Linux kwin_wayland_wrapper[1085]: > Warning:          Virtual modifier Hyper multiply defined
jun 09 20:45:24 Mancer-Linux kwin_wayland_wrapper[1085]: >                   Using 0, ignoring 0
jun 09 20:45:24 Mancer-Linux kwin_wayland_wrapper[1085]: > Warning:          Virtual modifier ScrollLock multiply defined
jun 09 20:45:24 Mancer-Linux kwin_wayland_wrapper[1085]: >                   Using 0, ignoring 0
jun 09 20:45:24 Mancer-Linux kwin_wayland_wrapper[1085]: Errors from xkbcomp are not fatal to the X server
jun 09 20:45:26 Mancer-Linux kwin_wayland[963]: Failed to register with host portal QDBusError("org.freedesktop.portal.Error.Failed", "Could not register app ID: Unable to open /proc/963/root")
jun 09 20:45:26 Mancer-Linux kwin_wayland[963]: Failed to register with host portal QDBusError("org.freedesktop.portal.Error.Failed", "Could not register app ID: Unable to open /proc/963/root")
jun 09 21:17:23 Mancer-Linux kwin_wayland[963]: QDBusConnection: couldn't handle call to Teardown, no slot matched
jun 09 21:17:23 Mancer-Linux kwin_wayland[963]: QDBusConnection: couldn't handle call to Teardown, no slot matched
jun 09 21:17:23 Mancer-Linux kwin_wayland[963]: Could not find slot Krunner1Adaptor::Teardown
jun 09 21:18:04 Mancer-Linux kwin_wayland[963]: Failed to fetch net.hadess.SensorProxy.HasAccelerometer property: QDBusError("org.freedesktop.DBus.Error.NoReply", "Remote peer disconnected")
jun 09 21:18:04 Mancer-Linux kwin_wayland[963]: Failed to fetch net.hadess.SensorProxy.HasAmbientLight property: QDBusError("org.freedesktop.DBus.Error.NoReply", "Remote peer disconnected")
jun 09 21:26:13 Mancer-Linux kwin_wayland[963]: Libinput: event20 - Wireless Controller Touchpad: kernel bug: Touch jump detected and discarded.
jun 09 21:26:14 Mancer-Linux kwin_wayland[963]: Libinput: event20 - Wireless Controller Touchpad: kernel bug: Touch jump detected and discarded.
jun 09 21:27:09 Mancer-Linux kwin_wayland[963]: Libinput: event20 - Wireless Controller Touchpad: kernel bug: Touch jump detected and discarded.
jun 09 21:44:42 Mancer-Linux kwin_wayland[963]: atomic commit failed: Permiso denegado
jun 09 21:44:42 Mancer-Linux kwin_wayland[963]: PipeWire remote error:  connection error
jun 09 21:44:43 Mancer-Linux systemd[880]: plasma-kwin_wayland.service: Consumed 4min 7.925s CPU time over 59min 19.554s wall clock time, 340.5M memory peak.

Also, this additional log, which I believe is relevant. Then, once the error occurred.

Sounds like something power related to me. Cable (to the gpu if discrete), or power supply. I think after checking the connection to the videocard and maybe changing the power cable, the only thing will be left is the power supply. Maybe it’s getting old and not delivering those 500w anymore at peak load.

1 Like

I’m not sure I would rule out overheating. I had a similar experience with a laptop. I was never able to pin point it to over heating, yet during activity where overheating was a possibility it would Lock up in a similar manner to what you are describing.

2 Likes

My first guess would be thermal or power related as well. If you have another power supply lying around, you can swap it out and see if it happens.

Also, writing a quick and dirty script to log your temperatures to a file while doing something resource intensive would definitely rule it out. Like:

#!/usr/bin/perl

use strict;
use warnings;
use File::Slurp;

my $log_file = '~/temp_log.txt';

while (1) {
  my $temps = `sensors`;
  append_file($log_file, $temps );
  sleep(3);
}

Then check the temp_log.txt file after your computer crashes.

Then I would try swapping out the RAM, or take one stick out at a time.

Then I would see if it happens with a live session from USB to see if it’s hard drive related. It might be hard to install one of those games in a live session, so running:

stress --cpu 12

or looking through https://wiki.archlinux.org/title/Stress_testing might trigger the problem.

Hope one of these helps!

1 Like

May be the following info may help you to prevent damage to your filesystem:

:footprints:

1 Like

Typically when the computer locks up as described, and certainly in my case, REISUB and REISUO don’t work. The computer is no longer responding to input from peripherals like the Keyboard.

If that works in @TheRooJZ case, that’s great, because it should be possible to ssh into the computer.

2 Likes

I would also try one very boring test before chasing more KDE logs: set the monitor to 60 Hz for a while and, if possible, test another HDMI cable/port. The 5600GT iGPU shares system memory, so a marginal RAM setting or display link can look like a GPU crash under heavier games.

Since it happens on Windows too, I would keep the test path mostly hardware/firmware: BIOS update if your board has one, load BIOS defaults, leave RAM at JEDEC speed for a day, then run memtest86+ and a GPU stress test separately. Change only one thing at a time, otherwise the kangaroo stays hard to catch.

3 Likes

The later kernel 7.x provides an improved driver for your graphics.

I have a testing laptop - ThinkPad T495 - which saw a massive improvement when switching to kernel 7.0.

This means that you will have to keep an eye on the kernel releases and keep up.

1 Like

Either that, or a sudden heat spike. Both can lock up the system at the hardware level, in which case the Magic SysRq keys won’t work anymore.

Mmm okay, the graphics in this type of processor are integrated, but I’ll try changing the power supply cable in this particular case for testing.

In the specific case of the power supply, I wouldn’t consider the possibility of it being old because it’s new; in fact, the PC is a pre-assembled kit with that same power supply, and it’s less than a month old. However, it’s possible the power supply has a manufacturing defect.

There would be a temperature that I could check with Mangohud, perhaps.

But the curious thing is that when playing the game I mentioned before, the one that doesn’t have the online error, it reaches a stable 80°C in that particular case and stays there.

As the gentleman @tracyanne mentioned, when the error occurs, it’s a total lock that doesn’t allow me to access the system. I can’t access a TTY, nor can I manually shut it down by pressing the button for 10 to 30 seconds; the only way to turn it off is by switching off the power supply switch.

I don’t think it’s a thermal issue, because usually when a temperature-related shutdown occurs, the PC shuts down completely, but the coolers, fans with LEDs, and other components continue to operate.

It’s as if only the video card crashes.

1 Like

I had this issue too and it was getting tough to deal with. Then the 7.x kernel came out, the issue resolved. Have you tried upgrading to a newer kernel? It has been months since I have had a hard lock with 7.x

if i remember correctly, i had this issue too.
can’t remember the solution, but i will do a quick search…

1 Like

For your information, @tracyanne is a female :kangaroo:. :wink:

5 Likes

/off

It’s in the nick actually. But i have to admit i also misread it at the beginning. I misread it as Tracy Cane which could be a male name. I guess i have watched too many CSI episodes and Mr Horatio Cane is burned too deep in my mind, which then plays associations :slight_smile:

3 Likes

That’s really weird that it doesn’t shut down by long pressing the power button…

I think that sounds like a motherboard issue. What manufacturer is your motherboard? Is your BIOS up to date?
From your original post I see it’s an AMD A520 with firmware from 2012, and AMD’s website has a chipset and RAID firmware update, the manufacturer’s website may have more. You might be able to update them through fwupd in Linux, but, chances are you’ll have to have Windows installed to update them.

It might be worth it to pull out the CMOS battery, hold the power button for 30 seconds, plug it back in.

So, I would go to your motherboard’s manufacturer’s website, and update everything you can.

Oh, so sorry @tracyanne It wasn’t my intention.

It is exactly this model from this manufacturer. I recently updated the BIOS as well in case I was looking for any errors, but I didn’t notice any difference.


@oe36 Well, I just installed the kernel yesterday, I’m going to test it today to verify.

1 Like

Have you actually attempted to ssh into the system, from another computer. if it’s only the graphics Card, then an ssh server should still be running

2 Likes

That’s actually a good idea. I could try too pinging the IP address of my PC that’s reserved on my router and see if the ping comes back. If it’s a total blocking , the ping shouldn’t even respond.

1 Like

The other thing is, if you can ssh in, you may be able to look through the system and find out what is going on, live logs, for example. Search for processes that should be running, that sort of thing.

3 Likes