[Stable Update] 2025-08-30 - Kernels, ROCm, Firefox, Steam, PHP

menerond · 30 August 2025 19:57

I have a 7800x3d CPU and a 7800xt GPU and I do not have any problems with games using opengl/vulkan. The only thing that is currently not working is ollama with a local llm. I can start the server and also the llm, but whenever I interact with it, I just get the following error:

SIGSEGV: segmentation violation
PC=0x7efede9c4640 m=16 sigcode=1 addr=0x28
signal arrived during cgo execution

Was working just fine before the update.

doctorx · 30 August 2025 19:57

that was the instructions… i dont need it now. everything is zfs except /boot/efi.

HDeDeDe · 30 August 2025 20:27

I uninstalled the intel vulkan drivers and it’s not crashing anymore. I don’t know why those were there in the first place or why it only started being a problem after this update but the issue is no longer happening.

TemplateR · 30 August 2025 20:32

After the latest Update my system ist broken. i can logging with my password, but then my Display showing “No Signal”.

I am using manjaro Gnome with AMD grapic card

menerond · 30 August 2025 20:34

Interesting. For whatever reason vulkan-intel is also installed for me, but it does not seem to cause any issues at least with games.

Update: Might have been pulled as an optional dependency of steam.

vcottineau · 30 August 2025 20:40

I uninstalled Intel Vulkan drivers but I am still facing an issue with ollama: After ROCm Upgrade, HSA_OVERRIDE_GFX_VERSION to 10.3.0 Causes SIGSEGV After Receiving Prompt; No Override Works, But Falls Back to CPU · Issue #12111 · ollama/ollama · GitHub

menerond · 30 August 2025 20:47

Thanks for linking the bug report. So I guess it is ollama or rocm. Nothing to do but downgrade/wait for an update.

MrLavender · 31 August 2025 00:09

It’s the Arch Linux ROCm 6.4.3 packaging, there’s something wrong with the build flags. There are numerous issues open about it (ollama is a wrapper for llama.cpp);

github.com/ggml-org/llama.cpp

Eval bug: llama-server crashes on ROCm 6.4.3-1 (Arch Linux)

opened 09:30AM - 27 Aug 25 UTC

closed 07:54PM - 28 Aug 25 UTC

SteelPh0enix

bug-unconfirmed

### Name and Version ❯ llama-cli --version ggml_cuda_init: GGML_CUDA_FORCE_MMQ:… no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 1 ROCm devices: Device 0: AMD Radeon RX 7900 XT, gfx1100 (0x1100), VMM: no, Wave Size: 32 version: 6297 (1cf123a3) built with clang version 19.0.0git (/srcdest/rocm-llvm d366fa84f3fdcbd4b10847ebd5db572ae12a34fb) for x86_64-pc-linux-gnu Following env variables have been set to configure llama-server: ```sh export LLAMA_ARG_BATCH=2048 export LLAMA_ARG_UBATCH=2048 export LLAMA_ARG_SWA_FULL=false export LLAMA_ARG_KV_SPLIT=false export LLAMA_SET_ROWS=1 # for ARG_KV_SPLIT=false to work export LLAMA_ARG_FLASH_ATTN=true export LLAMA_ARG_MLOCK=true export LLAMA_ARG_NO_MMAP=false export LLAMA_ARG_N_GPU_LAYERS=999 export LLAMA_OFFLINE=false export LLAMA_ARG_ENDPOINT_SLOTS=true export LLAMA_ARG_ENDPOINT_PROPS=true ``` ### Operating systems Linux ### GGML backends HIP ### Hardware Ryzen 5900X + Radeon RX 7900XT ### Models GPT-OSS 20B Mistral-Small-3.2-24B-Instruct-2506-UD-Q4_K_XL.gguf (Unsloth quant) ### Problem description & steps to reproduce Running llama-server built with latest ROCm version on Arch Linux (as of 27.08.2025 it's 6.4.3-1) crashes when trying to generate completion. To reproduce - run GPT-OSS-20B with `llama-server` and try requesting completion (for example, via web UI). It should immediately crash without any additional message (assuming non-verbose mode). I've ran `llama-server` with `valgrind` to see where the issue comes from, and there seems to be an issue with ROCm, but it's in destructors so it might as well be improperly handled destruction caused by the underlying issue. I've checked if other models also trigger that issue, but Qwen3-4B works without issues. ### First Bad Commit Probably unrelated to specific commit. ### Relevant log output ```shell main: server is listening on http://0.0.0.0:51536 - starting the main loop srv update_slots: all slots are idle srv log_server_r: request: GET / 127.0.0.1 200 Unauthorized: Invalid API Key srv log_server_r: request: GET /favicon.ico 127.0.0.1 401 srv log_server_r: request: GET /props 127.0.0.1 200 srv params_from_: Chat format: GPT-OSS slot launch_slot_: id 0 | task 0 | processing task slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 4096, n_keep = 0, n_prompt_tokens = 68 slot update_slots: id 0 | task 0 | kv cache rm [0, end) slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 68, n_tokens = 68, progress = 1.000000 slot update_slots: id 0 | task 0 | prompt done, n_past = 68, n_tokens = 68 ==17652== Invalid read of size 8 ==17652== at 0x1584F640: _M_begin (hashtable.h:440) ==17652== by 0x1584F640: begin (hashtable.h:642) ==17652== by 0x1584F640: begin (unordered_map.h:388) ==17652== by 0x1584F640: amd::device::Program::runInitFiniKernel(amd::device::Program::kernel_kind_t) const (devprogram.cpp:2958) ==17652== by 0x15882BC1: amd::Program::unload() (program.cpp:95) ==17652== by 0x15546C11: hip::FatBinaryDeviceInfo::~FatBinaryDeviceInfo() (hip_fatbin.cpp:98) ==17652== by 0x15547908: hip::FatBinaryInfo::~FatBinaryInfo() (hip_fatbin.cpp:129) ==17652== by 0x154E21B1: hip::DynCO::~DynCO() (hip_code_object.cpp:1152) ==17652== by 0x154E2355: hip::DynCO::~DynCO() (hip_code_object.cpp:1153) ==17652== by 0x1575BAC5: hip::PlatformState::loadModule(ihipModule_t**, char const*, void const*) (hip_platform.cpp:759) ==17652== by 0x15716EAE: hip::hipModuleLoadData(ihipModule_t**, void const*) (hip_module.cpp:61) ==17652== by 0xB898649: ??? (in /opt/rocm/lib/librocblas.so.4.4) ==17652== by 0xB899DB3: ??? (in /opt/rocm/lib/librocblas.so.4.4) ==17652== by 0xAE4AB6A: ??? (in /opt/rocm/lib/librocblas.so.4.4) ==17652== by 0xAE80C49: ??? (in /opt/rocm/lib/librocblas.so.4.4) ==17652== Address 0x28 is not stack'd, malloc'd or (recently) free'd ==17652== ==17652== ==17652== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==17652== Access not within mapped region at address 0x28 ==17652== at 0x1584F640: _M_begin (hashtable.h:440) ==17652== by 0x1584F640: begin (hashtable.h:642) ==17652== by 0x1584F640: begin (unordered_map.h:388) ==17652== by 0x1584F640: amd::device::Program::runInitFiniKernel(amd::device::Program::kernel_kind_t) const (devprogram.cpp:2958) ==17652== by 0x15882BC1: amd::Program::unload() (program.cpp:95) ==17652== by 0x15546C11: hip::FatBinaryDeviceInfo::~FatBinaryDeviceInfo() (hip_fatbin.cpp:98) ==17652== by 0x15547908: hip::FatBinaryInfo::~FatBinaryInfo() (hip_fatbin.cpp:129) ==17652== by 0x154E21B1: hip::DynCO::~DynCO() (hip_code_object.cpp:1152) ==17652== by 0x154E2355: hip::DynCO::~DynCO() (hip_code_object.cpp:1153) ==17652== by 0x1575BAC5: hip::PlatformState::loadModule(ihipModule_t**, char const*, void const*) (hip_platform.cpp:759) ==17652== by 0x15716EAE: hip::hipModuleLoadData(ihipModule_t**, void const*) (hip_module.cpp:61) ==17652== by 0xB898649: ??? (in /opt/rocm/lib/librocblas.so.4.4) ==17652== by 0xB899DB3: ??? (in /opt/rocm/lib/librocblas.so.4.4) ==17652== by 0xAE4AB6A: ??? (in /opt/rocm/lib/librocblas.so.4.4) ==17652== by 0xAE80C49: ??? (in /opt/rocm/lib/librocblas.so.4.4) ==17652== If you believe this happened as a result of a stack ==17652== overflow in your program's main thread (unlikely but ==17652== possible), you can try to increase the size of the ==17652== main thread stack using the --main-stacksize= flag. ==17652== The main thread stack size used in this run was 8388608. ==17652== ==17652== HEAP SUMMARY: ==17652== in use at exit: 1,278,763,386 bytes in 994,033 blocks ==17652== total heap usage: 3,326,026 allocs, 2,331,993 frees, 32,436,809,144 bytes allocated ==17652== ==17652== LEAK SUMMARY: ==17652== definitely lost: 0 bytes in 0 blocks ==17652== indirectly lost: 0 bytes in 0 blocks ==17652== possibly lost: 97,304 bytes in 231 blocks ==17652== still reachable: 1,278,665,458 bytes in 993,801 blocks ==17652== of which reachable via heuristic: ==17652== multipleinheritance: 1,664 bytes in 20 blocks ==17652== suppressed: 624 bytes in 1 blocks ==17652== Rerun with --leak-check=full to see details of leaked memory ==17652== ==17652== For lists of detected and suppressed errors, rerun with: -s ==17652== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) [1] 17652 segmentation fault (core dumped) valgrind llama-server --model ~/LLMs/gpt-oss-20b.auto.gguf --jinja --temp 1.0 ``` This is the output with `-v` flag and no valgrind: ``` main: server is listening on http://0.0.0.0:51536 - starting the main loop que start_loop: processing new tasks que start_loop: update slots srv update_slots: all slots are idle srv kv_cache_cle: clearing KV cache que start_loop: waiting for new tasks request: {"messages":[{"role":"user","content":"test"}],"stream":true,"cache_prompt":true,"reasoning_format":"none","samplers":"edkypmxt","temperature":0.8,"dynatemp_range":0,"dynatemp_exponent":1,"top_k":40,"top_p":0.95,"min_p":0.05,"typical_p":1,"xtc_probability":0,"xtc_threshold":0.1,"repeat_last_n":64,"repeat_penalty":1,"presence_penalty":0,"frequency_penalty":0,"dry_multiplier":0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":-1,"max_tokens":-1,"timings_per_token":false} srv params_from_: Grammar: srv params_from_: Grammar lazy: false srv params_from_: Chat format: GPT-OSS srv params_from_: Preserved token: 200005 srv params_from_: Preserved token: 200003 srv params_from_: Preserved token: 200008 srv params_from_: Preserved token: 200006 srv params_from_: Preserved token: 200007 srv add_waiting_: add task 0 to waiting list. current waiting = 0 (before add) que post: new task, id = 0/1, front = 0 que start_loop: processing new tasks que start_loop: processing task, id = 0 slot get_availabl: id 0 | task -1 | selected slot by lru, t_last = -1 slot reset: id 0 | task -1 | slot launch_slot_: id 0 | task 0 | launching slot : {"id":0,"id_task":0,"n_ctx":131072,"speculative":false,"is_processing":false,"params":{"n_predict":-1,"seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"top_n_sigma":-1.0,"xtc_probability":0.0,"xtc_threshold":0.10000000149011612,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"dry_multiplier":0.0,"dry_base":1.75,"dry_allowed_length":2,"dry_penalty_last_n":131072,"dry_sequence_breakers":["\n",":","\"","*"],"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"stop":[],"max_tokens":-1,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":true,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"","grammar_lazy":false,"grammar_triggers":[],"preserved_tokens":[200003,200005,200006,200007,200008],"chat_format":"GPT-OSS","reasoning_format":"none","reasoning_in_content":false,"thinking_forced_open":false,"samplers":["penalties","dry","top_k","typ_p","top_p","min_p","xtc","temperature"],"speculative.n_max":16,"speculative.n_min":0,"speculative.p_min":0.75,"timings_per_token":false,"post_sampling_probs":false,"lora":[]},"prompt":"<|start|>system<|message|>You are ChatGPT, a large language model trained by OpenAI.\nKnowledge cutoff: 2024-06\nCurrent date: 2025-08-27\n\nReasoning: medium\n\n# Valid channels: analysis, commentary, final. Channel must be included for every message.<|end|><|start|>user<|message|>test<|end|><|start|>assistant","next_token":{"has_next_token":true,"has_new_line":false,"n_remain":-1,"n_decoded":0,"stopping_word":""}} slot launch_slot_: id 0 | task 0 | processing task que start_loop: update slots srv update_slots: posting NEXT_RESPONSE que post: new task, id = 1, front = 0 slot update_slots: id 0 | task 0 | new prompt, n_ctx_slot = 131072, n_keep = 0, n_prompt_tokens = 68 slot update_slots: id 0 | task 0 | kv cache rm [0, end) slot update_slots: id 0 | task 0 | prompt processing progress, n_past = 68, n_tokens = 68, progress = 1.000000 slot update_slots: id 0 | task 0 | prompt done, n_past = 68, n_tokens = 68 srv update_slots: decoding batch, n_tokens = 68 clear_adapter_lora: call set_embeddings: value = 0 ```

philm · 31 August 2025 05:40

Arch needs to rebuild the whole rocm stack with !strip

Olli_GT · 31 August 2025 07:47

KBackup is working now again without errors. Thats nice!
(Since May error by create recovery information)

hopimet · 31 August 2025 08:42

Answer to myself.

Problem is solved by downgrading mesa and lib32-mesa from 25.2.1-1 to 25.1.7-1.

Pasma boots fine with wayland and kernel 6.16 after downgrade.

So, it seems to be a bug in mesa with recent kernels 6.17, 6.16 and 6.12 (but not 6.1) and/or nouveau driver.

Rmano · 31 August 2025 08:44

…or discover why strip is stripping too much - that could create tons of problems across everything…

AnthonyT · 31 August 2025 08:50

The update just now to ROCm 6.4.3-3 has resolved the issue, I am now getting replies back again.

hmcezar · 31 August 2025 09:24

I cannot update using pacman due to:

:: Proceed with installation? [Y/n] 
(295/295) checking keys in keyring                                           [--------------------------------------------] 100%
(295/295) checking package integrity                                         [--------------------------------------------] 100%
(295/295) loading package files                                              [--------------------------------------------] 100%
(295/295) checking for file conflicts                                        [--------------------------------------------] 100%
error: failed to commit transaction (conflicting files)
cpupower: /usr/libexec exists in filesystem
Errors occurred, no packages were upgraded.

I tried:

pacman -Qo /usr/libexec
error: No package owns /usr/libexec

Anyone else having a similar problem?

Aragorn · 31 August 2025 09:29

It’s a directory.

[nx-74205:/dev/pts/3][/home/aragorn]
[aragorn] >  ls -l /usr/libexec/
total 4
-rwxr-xr-x 1 root root 651 Aug  3 14:49 cpupower

[nx-74205:/dev/pts/3][/home/aragorn]
[aragorn] >  pacman -Qo "/usr/libexec"
/usr/libexec/ is owned by cpupower 6.16-1

Use the old trick…

sudo pacman -Syu --overwrite="/usr/libexec"

menerond · 31 August 2025 11:05

Can confirm. After the rocm update, ollama does not crash anymore after a prompt and the model is free to hallucinate again.

94afd24efe1948f87e7d · 31 August 2025 11:32

I run gnome with AMD gpu.
After sign in the screen goes black.
I read other people seem to have this issue too and possible fix is to remove the intel vulkan drivers?
Could the known issues please be updated to alert of this issue and include the correct fix to apply?
Thank in advance.

philm · 31 August 2025 12:35

Based on your info you still note an NVIDIA card. If it is AMDGPU you can try with removing the vulkan driver for Intel, but it also can be some issue with the mesa 25.2.x update …

94afd24efe1948f87e7d · 31 August 2025 12:45

Info was outdated xD
I tried uninstalling vulkan-intel and lib32-vulkan-intel but issue persists.
journal contains no clear indication what goes wrong

philm · 31 August 2025 13:12

You can try to update your kernel to a newer one like 6.12 series, which also might bring newer amdgpu drivers. Or try to downgrade your mesa drivers back to 25.1.x series using the applications downgrade or manjaro-downgrade …