[clang] [llvm] [AMDGPU] Introduce asyncmark/wait intrinsics (PR #173259)

Fri Jan 2 07:53:17 PST 2026

================
@@ -6594,12 +6594,18 @@ operations.
 ``buffer/global/flat_load/store/atomic`` instructions to global memory are
 termed vector memory operations.
 
+.. _amdgpu-lds-dma-bits:
+
 ``global_load_lds`` or ``buffer/global_load`` instructions with the `lds` flag
 are LDS DMA loads. They interact with caches as if the loaded data were
 being loaded to registers and not to LDS, and so therefore support the same
-cache modifiers. They cannot be performed atomically. They implement volatile
-(via aux/cpol bit 31) and nontemporal (via metadata) as if they were loads
-from the global address space.
+cache modifiers. They cannot be performed atomically. They can be performed with
+asynchronous, volatile and nontemporal semantics as if they were loads
+from the global address space:
+
+- asynchronous: aux bit u0x40000000
+- volatile: aux bit u0x20
+- nontemporal: metadata
----------------
nhaehnle wrote:

Looks like you changed your mind at some point and made the asynchronous bit 0x20. But then it clashes with the NV bit?

https://github.com/llvm/llvm-project/pull/173259