[llvm-branch-commits] [mlir] [mlir][AMDGPU] Update gather_to_lds with explicit-async support (PR #181082)
Zhuoran Yin via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Mon Feb 16 11:54:39 PST 2026
================
@@ -1099,14 +1100,21 @@ def AMDGPU_GatherToLDSOp :
* `$transferType`: type of the data to be transferred by each thread. This is used to determine
the size of the data to be transferred and the number of threads in the subgroup.
The transfer type must be a scalar type or a vector type with a single element type.
+ * If `$async` is set, the compiler will not attempt to infer the
+ memory waits needed to ensure that the DMA operation has succeeded
+ before a load that might access the stored-to LDS is performed.
+ Instead, the `rocdl.asyncmark` and `rocdl.wait.asyncmark N`
+ operations must be used to explicitly indicate the desired completion
+ behavior. This enables more precise calculation of these waits at the
+ cost of requiring user management of asynchrony.
The `$dst`, along with its indices, points to the memory location the subgroup of this thread
will write to.
Note: only supported on gfx9 and gfx10.
}];
let assemblyFormat = [{
- $src `[` $srcIndices `]` `,` $dst `[` $dstIndices `]` attr-dict `:` $transferType `,` type($src) `,` type($dst)
+ (`async` $async^)? $src `[` $srcIndices `]` `,` $dst `[` $dstIndices `]` attr-dict `:` $transferType `,` type($src) `,` type($dst)
----------------
jerryyin wrote:
Okay thanks for clarifying, this sounds reasonable to me.
https://github.com/llvm/llvm-project/pull/181082
More information about the llvm-branch-commits
mailing list