AMD 5900 XT & AMD GPU RX580X & Rocm

I have built over the weekend a new desktop with an AMD R9 5900XT CPU and AMD RX 580X GPU. I have installed it on top of BTRFS and tried to install the ROCm support on AMD, because I am trying to build some neural networks. The problem is that since, I have installed the machine the performance is really sluggish. I am getting some glitches on the AMD GPU. I have installed the mesa drivers but after a while I have switched to mesa-git.

I have also tried both linux511 and linux510, and I think linux510 is acting a bit better. I am really exasperated with this underwhelming performance. Sometimes my cursor is freezing, or there is a big of a delay when I execute some commands, even though the load is really low.

I presume it is some kind of incompatibility between the CPU, kernel, BIOS version, GPU, drivers but I can’t really say what exactly is the culprit. I have also tried three different versions of the BIOS and none of them seems to resolve my issues. Currently, I am even running the latest beta version 1.53 (X570 MSI Tomahawk WiFi).

This is the inxi output:

inxi -Fxxxz
System:    Kernel: 5.10.15-1-MANJARO x86_64 bits: 64 compiler: gcc v: 10.2.1 Desktop: i3 4.19.1 info: i3bar 
           dm: LightDM 1.30.0 Distro: Manjaro Linux 
Machine:   Type: Desktop System: Micro-Star product: MS-7C84 v: 1.0 serial: <filter> 
           Mobo: Micro-Star model: MAG X570 TOMAHAWK WIFI (MS-7C84) v: 1.0 serial: <filter> 
           UEFI: American Megatrends LLC. v: 1.53 date: 12/30/2020 
Battery:   Device-1: hidpp_battery_0 model: Logitech MX Keys Wireless Keyboard serial: <filter> 
           charge: 55% (should be ignored) rechargeable: yes status: Discharging 
CPU:       Info: 12-Core model: AMD Ryzen 9 5900X bits: 64 type: MT MCP arch: Zen 3 rev: 0 L2 cache: 6 MiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm bogomips: 177656 
           Speed: 3528 MHz min/max: 2200/3700 MHz boost: enabled Core speeds (MHz): 1: 3528 2: 2874 3: 2147 4: 2195 
           5: 3595 6: 2807 7: 2871 8: 2870 9: 2878 10: 2875 11: 2187 12: 2194 13: 2876 14: 2200 15: 2140 16: 2197 
           17: 2186 18: 2875 19: 2873 20: 2873 21: 2154 22: 2152 23: 2194 24: 2152 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] 
           vendor: Hewlett-Packard driver: amdgpu v: kernel bus ID: 2d:00.0 chip ID: 1002:67df class ID: 0300 
           Display: x11 server: X.Org 1.20.10 compositor: picom v: git-dac85 driver: loaded: amdgpu,ati 
           unloaded: modesetting alternate: fbdev,vesa resolution: 3440x1440~60Hz s-dpi: 96 
           OpenGL: renderer: AMD Radeon RX 480 Graphics (POLARIS10 DRM 3.40.0 5.10.15-1-MANJARO LLVM 11.0.1) 
           v: 4.6 Mesa 21.1.0-devel (git-766538f83c) direct render: Yes 
Audio:     Device-1: AMD Ellesmere HDMI Audio [Radeon RX 470/480 / 570/580/590] vendor: Hewlett-Packard 
           driver: snd_hda_intel v: kernel bus ID: 2d:00.1 chip ID: 1002:aaf0 class ID: 0403 
           Device-2: Advanced Micro Devices [AMD] Starship/Matisse HD Audio vendor: Micro-Star MSI driver: snd_hda_intel 
           v: kernel bus ID: 2f:00.4 chip ID: 1022:1487 class ID: 0403 
           Device-3: C-Media CM108 Audio Controller type: USB driver: hid-generic,snd-usb-audio,usbhid bus ID: 5-4:3 
           chip ID: 0d8c:013c class ID: 0300 
           Sound Server: ALSA v: k5.10.15-1-MANJARO 
Network:   Device-1: Realtek RTL8125 2.5GbE vendor: Micro-Star MSI driver: r8169 v: kernel port: f000 bus ID: 26:00.0 
           chip ID: 10ec:8125 class ID: 0200 
           IF: enp38s0 state: down mac: <filter> 
           Device-2: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel port: f000 bus ID: 28:00.0 chip ID: 8086:2723 
           class ID: 0280 
           IF: wlo1 state: up mac: <filter> 
           IF-ID-1: virbr0 state: down mac: <filter> 
Bluetooth: Device-1: Intel AX200 Bluetooth type: USB driver: btusb v: 0.8 bus ID: 1-4:2 chip ID: 8087:0029 
           class ID: e001 
           Message: Required tool hciconfig not installed. Check --recommends 
Drives:    Local Storage: total: 931.51 GiB used: 444.43 GiB (47.7%) 
           ID-1: /dev/nvme0n1 vendor: Samsung model: SSD 970 EVO Plus 1TB size: 931.51 GiB speed: 31.6 Gb/s lanes: 4 
           rotation: SSD serial: <filter> rev: 2B2QEXM7 temp: 38.9 C scheme: GPT 
Partition: ID-1: / size: 896.01 GiB used: 444.43 GiB (49.6%) fs: btrfs dev: /dev/nvme0n1p3 
           ID-2: /boot/efi size: 511 MiB used: 4.8 MiB (0.9%) fs: vfat dev: /dev/nvme0n1p1 
           ID-3: /home size: 896.01 GiB used: 444.43 GiB (49.6%) fs: btrfs dev: /dev/nvme0n1p3 
Swap:      ID-1: swap-1 type: partition size: 35 GiB used: 0 KiB (0.0%) priority: -2 dev: /dev/nvme0n1p2 
Sensors:   System Temperatures: cpu: 40.8 C mobo: N/A gpu: amdgpu temp: 40.0 C 
   Fan Speeds (RPM): N/A gpu: amdgpu fan: 1534 
Info:      Processes: 470 Uptime: 2h 49m wakeups: 4 Memory: 31.34 GiB used: 5.45 GiB (17.4%) Init: systemd v: 247 
   Compilers: gcc: 10.2.0 clang: 11.0.1 Packages: pacman: 1362 Shell: fish v: 3.1.2 running in: urxvtd 
inxi: 3.3.01 

And these are my errors from the journalctl

sp5100-tco sp5100-tco: Watchdog hardware is disabled
kvm: disabled by bios
[TTM] Failed to find memory space for buffer 0x00000000b5198c49 eviction
amdgpu 0000:2d:00.0: amdgpu: 00000000bbe3d5f8 pin failed
[drm:dm_plane_helper_prepare_fb [amdgpu]] *ERROR* Failed to pin 
framebuffer with error -12

Try booting using this parameter:

amdgpu.dpm=0

This disables dynamic power management that causes trouble sometimes.

Also take a look here:

https://wiki.archlinux.org/index.php/ATI

1 Like

You need to tweak AMDgpu kernel parameters. Then install corectrl to control the performance of the gpu and disable dpm if it is not enough.