[Mlir-commits] [mlir] [mlir][gpu] Add `gpu.subgroup_uniform` op (PR #157743)

Wed Sep 10 11:48:54 PDT 2025

================
@@ -3255,4 +3255,37 @@ def GPU_SubgroupBroadcastOp : GPU_Op<"subgroup_broadcast",
   let hasVerifier = 1;
 }
 
+def GPU_SubgroupUniformOp : GPU_Op<"subgroup_uniform",
+    [Pure, AllTypesMatch<["result", "src"]>,
+    DeclareOpInterfaceMethods<InferIntRangeInterface, ["inferResultRanges"]>] #
+    ElementwiseMappable.traits>,
+  Arguments<(ins AnyType:$src)> {
+  let summary = "Assumes value is unform across the lanes in subgroup";
+  let description = [{
+    The "subgroup_uniform" op assumes that the value is uniform across all lanes
+    in a subgroup. This means that all active lanes in the subgroup are expected
+    to have the same value.
+
+    This op can be used to inform the compiler that a value is uniform across
+    the subgroup, enabling optimizations. The result is poison if the value
+    is not actually uniform.
+
+    This op is functionally no-op as no valid program should change its
+    semantics if this op is removed. Backends can choose to ignore it or do
+    some optimizations (e.g. put value into scalar registers).
+
+    This op can be freely speculated across structured control flow as parent
+    active mask is always superset of current mask and if can hoist input
+    calculation you can hoist the operation itself as well.
+
+    Example:
+
+    ```mlir
+    %1 = gpu.subgroup_uniform %0 : f32
+    ```
+  }];
+  let results = (outs AnyType:$result);
+  let assemblyFormat = "$src attr-dict `:` type($result)";
+}
----------------
krzysz00 wrote:

```
%u0 = gpu.subgroup_uniform %v0 : index 
%u1 = gpu.subgroup_uniform %v1 : index 
// ...
%v2 = arith.addi %v0, %v1 : index
%v3 = arith.addi %u0, %u1 : index
```

Has the exact same one-of-these-is-underannotated semantics as
```
%m : memref<?xf32>
%m1 = memref.assume_align %m, 16 : memref<?xf32>
// ...
%l1 = memref.load %m1[%c0] : f32
%l2 = memref.load %m[%c0] : f32
```

Here, `%l1` can be analyzed to be loaded with an alignment of 16 bytes, `%l2` need not be.

One could, of course, have a rewrite that maximizes assumptions - replacing `%m` with `%m1` in `%l2` is correct, but not required.

In terms of analysis, in your example, the value `%v0` is uniform, but may not be _known_ to be uniform. An analyzer can (probably should) conclude that `%v2` is not known to be uniform because it doesn't know what `%v0`'s deal is. `%v3` does know, however, because it's using `%u0`, which is annotated.

That same interchangability logic from `memref.assume_alignment` applies - you can "freely" swap between `%v0` / `%u0`, gaining and losing assumptions as you go.

(Operationally, `gpu.subgroup_uniform` is expected to lower to material changes in register class on targets where that's a thing, but, semantically, that's not all that relevant)

https://github.com/llvm/llvm-project/pull/157743