Hi all,
I have a full AMD machine with an MSI motherboard and a Sapphire Nitro (radeon rx6700xt) graphic card.
I can only work more or less 15 minutes before the screes goes black.
In dmesg I see
[ 63.049915] systemd-journald[423]: /var/log/journal/20b860dfa515404eade47678bbfd1b08/user-1000.journal: Journal file uses a different sequence number ID, rotating.
[ 583.747952] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=4934, emitted seq=4936
[ 583.748377] amdgpu 0000:2f:00.0: amdgpu: GPU reset begin!
[ 583.939798] amdgpu 0000:2f:00.0: amdgpu: MODE1 reset
[ 583.939802] amdgpu 0000:2f:00.0: amdgpu: GPU mode1 reset
[ 583.939873] amdgpu 0000:2f:00.0: amdgpu: GPU smu mode1 reset
[ 595.467444] amdgpu 0000:2f:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 595.467743] [drm] PCIE GART of 512M enabled (table at 0x00000082FEB00000).
[ 595.467808] [drm] VRAM is lost due to GPU reset!
[ 595.467810] amdgpu 0000:2f:00.0: amdgpu: PSP is resuming...
[ 603.042342] [drm:psp_v11_0_memory_training [amdgpu]] *ERROR* send training msg failed.
[ 603.042513] amdgpu 0000:2f:00.0: amdgpu: Failed to process memory training!
[ 603.042515] [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
[ 603.042644] amdgpu 0000:2f:00.0: amdgpu: GPU reset(1) failed
[ 603.158851] snd_hda_intel 0000:2f:00.1: CORB reset timeout#2, CORBRP = 65535
[ 603.160665] amdgpu 0000:2f:00.0: amdgpu: GPU reset end with ret = -62
[ 603.160668] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -62
What can I do / try / look for ?
The system informations are
~ LANG=C inxi -Fazi ✔
System:
Kernel: 6.9.2-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 14.1.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/boot/vmlinuz-6.9-x86_64
root=UUID=cab9616a-eb09-42c4-89dc-6480898b9f00 rw quiet splash
udev.log_priority=3
Desktop: KDE Plasma v: 6.0.5 tk: Qt v: N/A info: frameworks v: 6.2.0
wm: kwin_x11 vt: 2 dm: SDDM Distro: Manjaro base: Arch Linux
Machine:
Type: Desktop Mobo: Micro-Star model: MEG X570 ACE (MS-7C35) v: 1.0
serial: <superuser required> uuid: <superuser required> UEFI: American
Megatrends LLC. v: 1.N0 date: 10/23/2023
CPU:
Info: model: AMD Ryzen 7 5800X bits: 64 type: MT MCP arch: Zen 3+ gen: 4
level: v3 note: check built: 2022 process: TSMC n6 (7nm) family: 0x19 (25)
model-id: 0x21 (33) stepping: 0 microcode: 0xA20102B
Topology: cpus: 1x cores: 8 tpc: 2 threads: 16 smt: enabled cache:
L1: 512 KiB desc: d-8x32 KiB; i-8x32 KiB L2: 4 MiB desc: 8x512 KiB
L3: 32 MiB desc: 1x32 MiB
Speed (MHz): avg: 2948 high: 3800 min/max: 2200/4850 boost: enabled
scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 3374 2: 3800
3: 2200 4: 3800 5: 2200 6: 2200 7: 3800 8: 3600 9: 2200 10: 3598 11: 2200
12: 2200 13: 3800 14: 2200 15: 2200 16: 3800 bogomips: 121653
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities:
Type: gather_data_sampling status: Not affected
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: mmio_stale_data status: Not affected
Type: reg_file_data_sampling status: Not affected
Type: retbleed status: Not affected
Type: spec_rstack_overflow mitigation: Safe RET
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
sanitization
Type: spectre_v2 mitigation: Retpolines; IBPB: conditional; IBRS_FW;
STIBP: always-on; RSB filling; PBRSB-eIBRS: Not affected; BHI: Not
affected
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: AMD Navi 22 [Radeon RX 6700/6700 XT/6750 XT / 6800M/6850M XT]
vendor: Sapphire driver: amdgpu v: kernel arch: RDNA-2 code: Navi-2x
process: TSMC n7 (7nm) built: 2020-22 pcie: gen: 4 speed: 16 GT/s
lanes: 16 ports: active: HDMI-A-1 empty: DP-1, DP-2, DP-3, Writeback-1
bus-ID: 2f:00.0 chip-ID: 1002:73df class-ID: 0300
Display: x11 server: X.Org v: 21.1.13 with: Xwayland v: 24.1.0
compositor: kwin_x11 driver: X: loaded: amdgpu unloaded: modesetting,radeon
alternate: fbdev,vesa dri: radeonsi gpu: amdgpu display-ID: :0 screens: 1
Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
s-diag: 582mm (22.93")
Monitor-1: HDMI-A-1 mapped: HDMI-A-0 model: Sony TV serial: <filter>
built: 2014 res: 1920x1080 hz: 60 dpi: 52 gamma: 1.2
size: 930x523mm (36.61x20.59") diag: 1067mm (42") ratio: 16:9 modes:
max: 1920x1080 min: 640x480
API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
device: 1 drv: swrast surfaceless: drv: radeonsi x11: drv: radeonsi
inactive: gbm,wayland
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.0.8-manjaro1.1
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX 6700 XT (radeonsi
navi22 LLVM 17.0.6 DRM 3.57 6.9.2-1-MANJARO) device-ID: 1002:73df
memory: 11.72 GiB unified: no
API: Vulkan v: 1.3.279 layers: N/A device: 0 type: discrete-gpu name: AMD
Radeon RX 6700 XT (RADV NAVI22) driver: mesa radv v: 24.0.8-manjaro1.1
device-ID: 1002:73df surfaces: xcb,xlib
Audio:
Device-1: AMD Navi 21/23 HDMI/DP Audio driver: snd_hda_intel v: kernel pcie:
gen: 4 speed: 16 GT/s lanes: 16 bus-ID: 2f:00.1 chip-ID: 1002:ab28
class-ID: 0403
Device-2: AMD Starship/Matisse HD Audio vendor: Micro-Star MSI
driver: snd_hda_intel v: kernel pcie: gen: 4 speed: 16 GT/s lanes: 16
bus-ID: 31:00.4 chip-ID: 1022:1487 class-ID: 0403
API: ALSA v: k6.9.2-1-MANJARO status: kernel-api with: aoss
type: oss-emulator tools: alsactl,alsamixer,amixer
Server-1: sndiod v: N/A status: off tools: aucat,midicat,sndioctl
Server-2: JACK v: 1.9.22 status: off tools: N/A
Server-3: PipeWire v: 1.0.7 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Intel I211 Gigabit Network vendor: Micro-Star MSI driver: igb
v: kernel pcie: gen: 1 speed: 2.5 GT/s lanes: 1 port: e000 bus-ID: 26:00.0
chip-ID: 8086:1539 class-ID: 0200
IF: enp38s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
IP v4: <filter> scope: global broadcast: <filter>
IP v6: <filter> type: noprefixroute scope: link
Device-2: Realtek RTL8125 2.5GbE vendor: Micro-Star MSI driver: r8169
v: kernel pcie: gen: 2 speed: 5 GT/s lanes: 1 port: d000 bus-ID: 27:00.0
chip-ID: 10ec:8125 class-ID: 0200
IF: enp39s0 state: up speed: 2500 Mbps duplex: full mac: <filter>
IF-ID-1: br0 state: up speed: 2500 Mbps duplex: unknown mac: <filter>
IP v4: <filter> scope: global broadcast: <filter>
Info: services: NetworkManager, nfsd, nginx, sshd, systemd-timesyncd
WAN IP: <filter>
Drives:
Local Storage: total: 22.74 TiB used: 11.72 TiB (51.5%)
SMART Message: Unable to run smartctl. Root privileges required.
ID-1: /dev/nvme0n1 maj-min: 259:0 vendor: Sabrent
model: Rocket 4 Plus Gaming size: 931.51 GiB block-size: physical: 512 B
logical: 512 B speed: 63.2 Gb/s lanes: 4 tech: SSD serial: <filter>
fw-rev: R4P47G.1 temp: 41.9 C scheme: GPT
ID-2: /dev/sda maj-min: 8:0 vendor: Seagate model: ST8000NE001-2M7101
size: 7.28 TiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
tech: HDD rpm: 7200 serial: <filter> fw-rev: EN01 scheme: GPT
ID-3: /dev/sdb maj-min: 8:16 vendor: Seagate model: ST8000VN004-2M2101
size: 7.28 TiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
tech: HDD rpm: 7200 serial: <filter> fw-rev: SC60 scheme: GPT
ID-4: /dev/sdc maj-min: 8:32 vendor: Seagate model: ST8000VN004-2M2101
size: 7.28 TiB block-size: physical: 4096 B logical: 512 B speed: 6.0 Gb/s
tech: HDD rpm: 7200 serial: <filter> fw-rev: SC60 scheme: GPT
Partition:
ID-1: / raw-size: 931.22 GiB size: 915.53 GiB (98.32%)
used: 29.14 GiB (3.2%) fs: ext4 dev: /dev/nvme0n1p2 maj-min: 259:2
ID-2: /boot/efi raw-size: 300 MiB size: 299.4 MiB (99.80%)
used: 296 KiB (0.1%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
Swap:
Alert: No swap data was found.
Sensors:
System Temperatures: cpu: 31.0 C mobo: 29.0 C gpu: amdgpu temp: 39.0 C
mem: 32.0 C
Fan Speeds (rpm): fan-1: 0 fan-2: 1037 fan-3: 971 fan-4: 628 fan-5: 644
fan-6: 645 fan-7: 0 gpu: amdgpu fan: 0
Info:
Memory: total: 128 GiB note: est. available: 125.72 GiB
used: 3.35 GiB (2.7%)
Processes: 342 Power: uptime: 9m states: freeze,mem,disk suspend: deep
avail: s2idle wakeups: 0 hibernate: platform avail: shutdown, reboot,
suspend, test_resume image: 50.27 GiB services: org_kde_powerdevil,
power-profiles-daemon, upowerd Init: systemd v: 255 default: graphical
tool: systemctl
Packages: pm: pacman pkgs: 1417 libs: 375 tools: pamac pm: flatpak pkgs: 0
Compilers: clang: 17.0.6 gcc: 14.1.1 alt: 13 Shell: Zsh v: 5.9 default: Bash
v: 5.2.26 running-in: konsole inxi: 3.3.34
~