ROCm 6.4.3-1 has broken Ollama LLM inferencing on both Linux & Windows. (GTT?) | AMD Developer Community | Page 1

sick matrix Aug 28, 2025, 12:25 AM

#

Before I get into it, for context I daily drive Linux and can confirm issues myself, and I've found multiple threads from others who are on Windows having a similar issue.

The latest public version of ROCm (6.4.3) is causing substantial issues with the Ollama LLM backend, which uses its own engine now, with llama.cpp as fallback. I've confirmed this is caused directly by ROCm 6.4.3, as downgrading the Ollama PKG doesn't fix the issue, but downgrading ROCm does. However that's not a real fix, especially for us on rolling release Linux distros. Partial upgrades = nono. I've attached a few images showing some of the Ollama server behaviour. It appears that something regarding GTT is broken, as ROCm regularly tries to allocate more than even exists, at least in this context with Ollama. I haven't directly tried llama.cpp (neither Vulkan or ROCm) yet for this specific issue, but I wouldn't be surprised if it's also impacted.

Multiple threads are reporting the same issue on several generations of AMD hardware. It's likely others are impacted, but I haven't directly observed it.
The issue at Hand:
When running ollama serve, it loads fine initially until you actually try loading any model (regardless of size, even a 270M model will cause this) in which it'll load the model into VRAM, then "inference" absolutely nothing. Not a single token, and then immediately aborts the inference all together to the point where it even unloads from VRAM. **This is not a system resource issue. I've confirmed that via testing multiple different model sizes & architectures, plus some of the threads linked below mention it too.

It seems it's related to ROCm allocating too much GTT, exceeding my VRAM (24GB, but GTT in Ollama's debug log attempts to use ~30GB. This is not an issue with any of my models or parameters, this only began happening with the new ROCm rev. This causes a segfault. It attempts to map to 0x28, which I assume is a reserved address space. (see screenshot 2.)

#

Here are some other reports regarding this issue, spanning across multiple generations;

https://github.com/ollama/ollama/issues/11975
https://github.com/ollama/ollama/issues/12062
https://github.com/ollama/ollama/issues/12071
There are plenty more to find if you just look at the issues page.

I haven't seen a report of this occurring on NVIDIA, although it's not impossible that this is actually an issue with Ollama in some form.

#

Don't have Nitro so the images didn't fit within context. Here's the images I was referring to.

#

it seems that Ollama is reading the total memory of AMD systems as entirely VRAM; https://github.com/ollama/ollama/issues/12062 -- I should also note that forcing environment variables doesn't work. I run Navi31/GFX1100, setting HSA_OVERRIDE_GFX_VERSION=11.0.0 doesn't work for me, and other people trying this with their respective GFX____ are getting the same results -- nada.

#

All of these issues impact Ollama v0.11.4 to v0.11.7 -- downgrading ROCm is the only viable method right now, or so it seems. I'm on Arch Linux, kernel 6.16.3-zen. 7800X3D + 7900 XTX + 32GB sRAM. x870E.

warm walrus Aug 28, 2025, 8:01 PM

#

Thanks for flagging this in detail, @sick matrix. We’ve got someone looking into your query. We’ll circle back once we have more clarity.

sick matrix Aug 29, 2025, 7:39 AM

#

warm walrus Thanks for flagging this in detail, <@280763994976026625>. We’ve got someone loo...

Thank you very much! I've got some additional information after doing some testing. I went and tested llama.cpp directly using ROCm 6.4.3 + Vulkan, and it's having the same problem. Using vulkan "worked" in a sense. It off-loaded to my CPU though, despite parameters dictating layers be offloaded to the GPU. ROCm w/ llama.cpp didn't, it just threw me the same issue I was having with Ollama. My 7800X3D has the iGPU completely disabled btw.

I went and tested Ollama again, but it still doesn't inference. But I did notice this after running some debugging tests:

load_tensors: offloading 48 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 49/49 layers to GPU
load_tensors:        ROCm0 model buffer size = 16722.37 MiB
load_tensors:          CPU model buffer size =   166.92 MiB
load_all_data: using async uploads for device ROCm0, buffer type ROCm0, backend ROCm0
[generic model loading crap]
load_all_data: no device found for buffer type CPU for async uploads```

sick matrix Aug 30, 2025, 8:27 AM

#

https://github.com/ollama/ollama/issues/11421 — other people are starting to confirm downgrading ROCm solves the problem regardless of OS.

sick matrix Aug 30, 2025, 2:05 PM

#

@warm walrus Hey there! The issue is resolved! An update was pushed to rocm-device-libs , rocm-llvm + rocblas that has completely resolved the issue, at least on my end using Arch Linux on a 7800X3D + 7900 XTX. Appreciate the quick patch! Thank you, and if you get the chance let the ROCm team know I'm appreciative too.

Finally able to get back to work! <3

#ROCm 6.4.3-1 has broken Ollama LLM inferencing on both Linux & Windows. (GTT?)