HIP support for threadfence/atomics with memory ordering | AMD Developer Community | Page 1

eager lance Jul 14, 2025, 11:40 AM

#

Implementing some datastructures and algorithms becomes easier if atomics with memory ordering are supported. Currently we have to add a full threadfence when required, but in some cases, a less strong fence would suffice, say only acquire or release.
I saw some references to __builtin_amdgcn_fence in the llvm repos, but im not sure what the status of this is.
So my question is, is it possible to use weaker threadfences, and if so what is the recommended way to do this

white veldt Jul 14, 2025, 3:04 PM

#

Hey, it might be helpful to share a minimal CUDA equivalent of this, to understand exactly what's missing on the HIP side.

misty tulip Jul 14, 2025, 4:41 PM

#

We have support for __atomic builtins which take a memory ordering parameter; for an example in the wild, see https://github.com/vllm-project/vllm/blob/6d0cf239c66936ff52582042698fd1aeb2a73bb6/csrc/custom_all_reduce.cuh#L250. I think these are the gcc built-ins, or at least the syntax is the same, trying to find some documentation for it

GitHub

vllm/csrc/custom_all_reduce.cuh at 6d0cf239c66936ff52582042698fd1ae...

A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm

#

Actually if you look at the source for the HIP atomic* functions, they call our own builtins as well: https://github.com/ROCm/clr/blob/amd-staging/hipamd/include/hip/amd_detail/amd_hip_atomic.h. For example, atomicAdd(address, val) resolves to __hip_atomic_fetch_add(address, val, __ATOMIC_RELAXED, __HIP_MEMORY_SCOPE_SYSTEM)

#

so you should be able to call these directly with your preferred memory ordering

eager lance Jul 15, 2025, 8:58 AM

#

Oh this is perfect

#HIP support for threadfence/atomics with memory ordering