[clang] [llvm] [AMDGPU] Introduce asyncmark/wait intrinsics (PR #173259)
Nicolai Hähnle via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 2 07:53:17 PST 2026
================
@@ -6594,12 +6594,18 @@ operations.
``buffer/global/flat_load/store/atomic`` instructions to global memory are
termed vector memory operations.
+.. _amdgpu-lds-dma-bits:
+
``global_load_lds`` or ``buffer/global_load`` instructions with the `lds` flag
are LDS DMA loads. They interact with caches as if the loaded data were
being loaded to registers and not to LDS, and so therefore support the same
-cache modifiers. They cannot be performed atomically. They implement volatile
-(via aux/cpol bit 31) and nontemporal (via metadata) as if they were loads
-from the global address space.
+cache modifiers. They cannot be performed atomically. They can be performed with
+asynchronous, volatile and nontemporal semantics as if they were loads
+from the global address space:
+
+- asynchronous: aux bit u0x40000000
+- volatile: aux bit u0x20
+- nontemporal: metadata
----------------
nhaehnle wrote:
Looks like you changed your mind at some point and made the asynchronous bit 0x20. But then it clashes with the NV bit?
https://github.com/llvm/llvm-project/pull/173259
More information about the llvm-commits
mailing list