[Mlir-commits] [mlir] [mlir][gpu] Add gpu split barrier ops (PR #178894)

Fri Jan 30 06:22:24 PST 2026

https://github.com/Hardcode84 created https://github.com/llvm/llvm-project/pull/178894

Split barriers enable more efficient execution by allowing independent computation to be performed between signal and wait phases while all workitems converge to the barrier.

Split barriers are suppoted by multiple vendors: AMDGPU `s_barrier_signal`/`s_barrier_wait`, Intel/SPIR-V SPV_INTEL_split_barrier https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_split_barrier.html

>From eab530980c82f23902e29ef3ab2d256e6780fc41 Mon Sep 17 00:00:00 2001
From: Ivan Butygin <ivan.butygin at gmail.com>
Date: Fri, 30 Jan 2026 14:58:03 +0100
Subject: [PATCH] [mlir][gpu] Add gpu split barrier ops

Split barriers enable more efficient execution by allowing independent computation to be performed between signal and wait phases while all workitems converge to the barrier.

Split barriers are suppoted by multiple vendors: AMDGPU s_barrier_signal/s_barrier_wait, Intel/SPIR-V SPV_INTEL_split_barrier https://github.khronos.org/SPIRV-Registry/extensions/INTEL/SPV_INTEL_split_barrier.html
---
 mlir/include/mlir/Dialect/GPU/IR/GPUOps.td | 75 ++++++++++++++++++++++
 mlir/test/Dialect/GPU/ops.mlir             | 14 ++++
 2 files changed, 89 insertions(+)

diff --git a/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td b/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
index 7891cf19ac921..4343881dd8a4d 100644
--- a/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
+++ b/mlir/include/mlir/Dialect/GPU/IR/GPUOps.td
@@ -1478,6 +1478,81 @@ def GPU_BarrierOp : GPU_Op<"barrier">,
                   OpBuilder<(ins "Value":$memrefToFence)>];
 }
 
+def GPU_BarrierArriveOp : GPU_Op<"barrier_arrive">,
+    Arguments<(ins OptionalAttr<GPU_AddressSpaceAttrArray> :$address_spaces)> {
+  let summary = "Signals arrival at a workgroup barrier.";
+  let description = [{
+    The `barrier_arrive` op signals that the current work item has arrived at
+    a synchronization point. This is the first half of a split barrier - it
+    signals arrival without waiting for other work items.
+
+    ```mlir
+    gpu.barrier_arrive
+    ```
+
+    This operation only signals arrival and does not wait. It must be paired
+    with a corresponding `gpu.barrier_wait` operation to complete the
+    synchronization.
+
+    The `memfence` attribute controls memory visibility. Memory accesses to
+    the specified address spaces that occur before `barrier_arrive` in program
+    order will be visible to other work items after they execute the
+    corresponding `barrier_wait`. If `memfence` is not specified, all memory
+    accesses are included. Specifying `memfence []` creates a pure
+    synchronization barrier with no memory ordering guarantees.
+
+    ```mlir
+    // All memory accesses before arrive visible after wait.
+    gpu.barrier_arrive
+    // Only workgroup address space accesses visible.
+    gpu.barrier_arrive memfence [#gpu.address_space<workgroup>]
+    // No memory visibility guarantees (pure synchronization).
+    gpu.barrier_arrive memfence []
+    ```
+
+    Either none or all work items of a workgroup need to execute this op
+    in convergence.
+  }];
+  let assemblyFormat = "(`memfence` $address_spaces^)? attr-dict";
+}
+
+def GPU_BarrierWaitOp : GPU_Op<"barrier_wait">,
+    Arguments<(ins OptionalAttr<GPU_AddressSpaceAttrArray> :$address_spaces)> {
+  let summary = "Waits for all work items to arrive at a workgroup barrier.";
+  let description = [{
+    The `barrier_wait` op waits until all work items in the workgroup have
+    signaled arrival at the barrier. This is the second half of a split
+    barrier.
+
+    ```mlir
+    gpu.barrier_wait
+    ```
+
+    This operation waits for all work items to arrive. It must be paired
+    with a corresponding `gpu.barrier_arrive` operation that signals arrival.
+
+    The `memfence` attribute controls memory visibility. After `barrier_wait`
+    completes, memory accesses to the specified address spaces that occurred
+    before `barrier_arrive` in other work items will be visible to the
+    current work item. If `memfence` is not specified, all memory accesses
+    are included. Specifying `memfence []` creates a pure synchronization
+    barrier with no memory ordering guarantees.
+
+    ```mlir
+    // All memory accesses before arrive visible after wait.
+    gpu.barrier_wait
+    // Only workgroup address space accesses visible.
+    gpu.barrier_wait memfence [#gpu.address_space<workgroup>]
+    // No memory visibility guarantees (pure synchronization).
+    gpu.barrier_wait memfence []
+    ```
+
+    Either none or all work items of a workgroup need to execute this op
+    in convergence.
+  }];
+  let assemblyFormat = "(`memfence` $address_spaces^)? attr-dict";
+}
+
 def GPU_GPUModuleOp : GPU_Op<"module", [
       IsolatedFromAbove, DataLayoutOpInterface, HasDefaultDLTIDataLayout,
       NoRegionArguments, SymbolTable, Symbol] # GraphRegionNoTerminator.traits> {
diff --git a/mlir/test/Dialect/GPU/ops.mlir b/mlir/test/Dialect/GPU/ops.mlir
index 1d05268ed4475..4dc0ccc03eab2 100644
--- a/mlir/test/Dialect/GPU/ops.mlir
+++ b/mlir/test/Dialect/GPU/ops.mlir
@@ -188,6 +188,20 @@ module attributes {gpu.container_module} {
       gpu.barrier memfence [#gpu.address_space<private>]
       gpu.barrier memfence []
 
+      // CHECK: gpu.barrier_arrive
+      // CHECK: gpu.barrier_arrive memfence [#gpu.address_space<workgroup>]
+      // CHECK: gpu.barrier_arrive memfence []
+      gpu.barrier_arrive
+      gpu.barrier_arrive memfence [#gpu.address_space<workgroup>]
+      gpu.barrier_arrive memfence []
+
+      // CHECK: gpu.barrier_wait
+      // CHECK: gpu.barrier_wait memfence [#gpu.address_space<workgroup>]
+      // CHECK: gpu.barrier_wait memfence []
+      gpu.barrier_wait
+      gpu.barrier_wait memfence [#gpu.address_space<workgroup>]
+      gpu.barrier_wait memfence []
+
       "some_op"(%bIdX, %tIdX) : (index, index) -> ()
       %42 = memref.load %arg1[%bIdX] : memref<?xf32, 1>
       gpu.return