[Mlir-commits] [mlir] [mlir][AMDGPU] Add wrappers for in-memory barriers on gfx1250 (PR #180112)

Fri Feb 6 08:20:18 PST 2026

================
@@ -1541,4 +1541,220 @@ def AMDGPU_TensorStoreFromLDSOp :
   }];
 }
 
+//===----------------------------------------------------------------------===//
+// In-LDS Barrier Operations
+//
+// General note: any of these operations that impact memory have read and write
+// effects as a crude model of their atomic nature - we don't want "reads"
+// being hoisted out of loops.
+//===----------------------------------------------------------------------===//
+
+def AMDGPU_DsBarrierInitOp :
+    AMDGPU_Op<"ds_barrier_init">,
+    Arguments<(ins Arg<MemRefOf<[AMDGPU_DsBarrierStateType]>, "barrier(s)",
+                       [MemRead, MemWrite]>:$base,
+                   Variadic<Index>:$indices,
+                   I32:$participants)> {
+  let summary = "Initialize an in-LDS barrier.";
+  let description = [{
+    Given the location `!amdgpu.ds_barrier_state` in LDS (as specified by `base` and `indices`),
+    initialize the barrier structure so that the pending and init counts are equal to
+    `participants - 1`, which will have its high bits masked off, and its phase is equal to 0.
+
+    Note that we subtract 1 from `participants` when constructing the barrier state
+    to provide clearer high-level semantics.
+
+    The subtraction means that, when the `participant`th arrival occurs, the phase will change.
+    In practical terms, this means that you can use (for example) the number of subgroups or
+    waves per workgroup as `participants`, instead of manually needing to remove one.
+
+    While the write of the initial state will be performed atomically, no synchronization
+    between waves will be performed by this operation.
+
+    Example:
+    ```mlir
+    amdgpu.ds_barrier_init %barrier[], %c32 : memref<!amdgpu.ds_barrier_state, #gpu.address_space<workgroup>>, i32
+    ```
+
+    This operation is only available on gfx1250+.
+  }];
+
+  let assemblyFormat = [{
+    $base `[` $indices `]` `,` $participants attr-dict `:` type($base) `,` type($participants)
+  }];
+
+  let hasVerifier = 1;
+}
+
+def AMDGPU_DsBarrierPollStateOp :
+    AMDGPU_Op<"ds_barrier_poll_state">,
+    Arguments<(ins Arg<MemRefOf<[AMDGPU_DsBarrierStateType]>, "barrier(s)",
+                       [MemRead, MemWrite]>:$base,
+                 Variadic<Index>:$indices)>,
+    Results<(outs AMDGPU_DsBarrierStateType:$out)> {
+  let summary = "Atomically read the state of an in-LDS barrier.";
+  let description = [{
+    Atomically read and return the state of the barrier at `base[indices...]`.
+
+    This will ultimately act like a `memref.load`, but this operation will ensure
+    that appropriate atomic orderings and syncscopes are set.
+
+    Example:
+    ```mlir
+    %state = amdgpu.ds_barrier_poll_state %barrier[] : memref<!amdgpu.ds_barrier_state, #gpu.address_space<workgroup>> -> !amdgpu.ds_barrier_state
+    ```
+
+    This operation is only available on gfx1250+.
+  }];
+
+  let assemblyFormat = [{
+    $base `[` $indices `]` attr-dict `:` type($base) `->` type($out)
+  }];
+
+  let hasVerifier = 1;
+}
+
+def AMDGPU_DsAsyncBarrierArriveOp :
+    AMDGPU_Op<"ds_async_barrier_arrive">,
+    Arguments<(ins Arg<MemRefOf<[AMDGPU_DsBarrierStateType]>, "barrier(s)",
+                       [MemRead, MemWrite]>:$base,
+                 Variadic<Index>:$indices)> {
+  let summary = "Asynchronously arrive at an in-LDS barrier.";
+  let description = [{
+    Add a arrival to the LDS barrier at `base[indices]` to the sequence of pending
+    asynchronous memory operations.
+
+    This will add an "asynchronous memory operation" to the in-order list of pending
+    asynchronous loads from global memory to LDS. When the queue of such operations
+    issued before this operation is complete, the specified barrier will be arrived at,
+    decrementing the pending count by 1 **per lane that executes it** and rolling
+    over the phase if applicable.
+
+    This operation does not return the old barrier state.
+
+    Example:
+    ```mlir
+    amdgpu.ds_async_barrier_arrive %barrier[] : memref<!amdgpu.ds_barrier_state, #gpu.address_space<workgroup>>
+    ```
+
+    This operation is only available on gfx1250+.
+  }];
+
+  let assemblyFormat = [{
+    $base `[` $indices `]` attr-dict `:` type($base)
+  }];
+
+  let hasVerifier = 1;
+}
+
+def AMDGPU_DsBarrierArriveOp :
+    AMDGPU_Op<"ds_barrier_arrive">,
+    Arguments<(ins Arg<MemRefOf<[AMDGPU_DsBarrierStateType]>, "barrier(s)",
+                       [MemRead, MemWrite]>:$base,
+                 Variadic<Index>:$indices,
+                 I64:$count)>,
----------------
PMylon wrote:

nit: just curious, I know that LLVM intrisic expects an i64_ty for count (https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/IR/IntrinsicsAMDGPU.td#L3691), but count cannot be more than 32bits according to ISA docs. Is this intentional (e.g. to be future proof) or not?

https://github.com/llvm/llvm-project/pull/180112