[Mlir-commits] [mlir] [mlir][GPU] Extend gpu.barrier with scope and named-barrier support (PR #195692)

Mon May 4 12:56:17 PDT 2026

================
@@ -1443,35 +1451,92 @@ def GPU_BarrierOp : GPU_Op<"barrier">,
     accessing the same memory can be avoided by synchronizing work items
     in-between these accesses.
 
-    If the `memfence` attribute is specified, the set of memory accesses that must
-    by completed after the barrier resolves is limited to only those accesses that
-    read from or write to the specified address spaces (though accesses to other
-    address spaces may be completed as well, especially if a particular combination
-    of address spaces is not supported on a given backend). In particular,
-    specifying `memfence []` creates a barrier that is not required to affect
-    the visibility of any memory operations and is purely used for synchronizing
-    work items.
+    The `scope` attribute controls the execution scope of the barrier:
 
     ```mlir
-    // Only workgroup address spaces accesses required to be visible.
+    // Synchronize within a subgroup (warp/wavefront).
+    gpu.barrier scope <subgroup>
+    // Synchronize across the entire device.
+    gpu.barrier scope <device>
+    ```
+
+    A `named` barrier allows synchronizing a specific subset of subgroups
+    that have been associated with a named barrier handle. Named barriers
+    require workgroup scope.
+
+    ```mlir
+    // Initialize a named barrier for 4 participating members.
+    %nb = gpu.initialize_named_barrier %c4 : i32 -> !gpu.named_barrier
+    // Wait on the named barrier.
+    gpu.barrier named(%nb : !gpu.named_barrier)
+    ```
+
+    If the `memfence` attribute is specified, the set of memory accesses that
+    must be completed after the barrier resolves is limited to only those
+    accesses that read from or write to the specified address spaces. In
+    particular, specifying `memfence []` creates a barrier that is not required
+    to affect the visibility of any memory operations and is purely used for
+    synchronizing work items.
+
+    ```mlir
+    // Only workgroup address space accesses required to be visible.
     gpu.barrier memfence [#gpu.address_space<workgroup>]
     // No memory accesses required to be visible.
     gpu.barrier memfence []
     // All memory accesses required to be visible.
     gpu.barrier
     ```
 
-    Either none or all work items of a workgroup need to execute this op
-    in convergence.
+    The three clauses can be combined in any order, but not all combinations may
+    be supported on a given target:
+
+    ```mlir
+    // Named barrier with a workgroup-only memory fence.
+    gpu.barrier named(%nb : !gpu.named_barrier) memfence [#gpu.address_space<workgroup>]
+    // Subgroup barrier with a global fence.
+    gpu.barrier memfence [#gpu.address_space<global>] scope <subgroup>
+    ```
+
+    Once one thread of execution in a given scope (say, thread in a workgroup)
+    has executed a particular dynamic instance of `gpu.barrier`, all other threads
+    in that scope must execute the same dynamic instance of `gpu.barrier` before
+    executing any other instance of it.
+  }];
+  let assemblyFormat = [{
+    oilist(
+        `named` `(` $named_barrier `:` type($named_barrier) `)`
+      | `memfence` $address_spaces
+      | `scope` $scope
+    ) attr-dict
   }];
-  let assemblyFormat = "(`memfence` $address_spaces^)? attr-dict";
   let hasCanonicalizer = 1;
+  let hasVerifier = 1;
   let builders = [OpBuilder<(
                       ins CArg<"std::optional<::mlir::gpu::AddressSpace>",
                                "std::nullopt">:$addressSpace)>,
                   OpBuilder<(ins "Value":$memrefToFence)>];
 }
 
+def GPU_InitializeNamedBarrierOp
----------------
grypp wrote:

PTX has had named barriers via `bar.sync a, b` for a long time, and the `nvvm.barrier` op already exposes them as barrierId + numberOfThreads. Has lowering `gpu.barrier named(%nb)` to that been considered? 

What would `gpu.initialize_named_barrier` lower to on NVVM, given there's no init op in PTX?

https://github.com/llvm/llvm-project/pull/195692