[Mlir-commits] [mlir] [mlir][AMDGPU] Add scaled wmma ops for gfx1250 (PR #169854)

Thu Nov 27 13:37:25 PST 2025

================
@@ -1218,6 +1226,54 @@ def AMDGPU_ScaledMFMAOp :
   let hasCanonicalizer = 1;
 }
 
+def AMDGPU_ScaledWMMAOp
+    : AMDGPU_Op<"scaled_wmma", [AllTypesMatch<["destC", "destD"]>, Pure]>,
+      Arguments<(ins ConfinedAttr<I32Attr, [IntIsOneOf<[16, 32]>]>:$m,
+          ConfinedAttr<I32Attr, [IntIsOneOf<[16]>]>:$n,
+          ConfinedAttr<I32Attr, [IntIsOneOf<[128]>]>:$k,
+          ScaledWMMAInTypes:$sourceA, ScaledWMMAInTypes:$sourceB,
+          ScaledWMMAOutTypes:$destC, AnyTypeOf<[I32, I64]>:$scaleA,
+          AnyTypeOf<[I32, I64]>:$scaleB,
+          DefaultValuedAttr<I32Attr, "0">:$scaleAType,
+          DefaultValuedAttr<I32Attr, "0">:$fmtScaleA,
+          DefaultValuedAttr<I32Attr, "0">:$scaleBType,
+          DefaultValuedAttr<I32Attr, "0">:$fmtScaleB)>,
+      Results<(outs ScaledWMMAOutTypes:$destD)> {
+  let summary = "MLIR wrapper for RDNA scaled wmma instructions";
+  let description = [{
+    The `amdgpu.scaled_wmma` op is an MLIR wrapper around intrinsics for scaled
+    `wmma` instructions in the RDNA architecture. These instructions perform
+    matrix multiplication with per-block scaling of inputs, supporting fp4, fp6,
+    and fp8 data formats.
+
+    The scale instructions support two tile sizes:
+    - 16x16x128 with mixed f8/f6/f4 formats (output: vector<4xf32>)
+    - 32x16x128 with f4 format only (output: vector<8xf32>)
+
+    The `scaleA` and `scaleB` parameters are scale exponents that can be either
+    i32 (for wmma.scale) or i64 (for wmma.scale16) to support per-block scaling.
----------------
Muzammiluddin-Syed-ECE wrote:

```suggestion
    The scale instructions support two block sizes:
    - Block size of 16 where `scaleA` and `scaleB` parameters are of type i32 (wmma.scale)
    - Block size of 32 where `scaleA` and `scaleB` parameters are of type i64 (wmma.scale)
```
nit: Might be worth stating explicitly that wmma.scale and wmma.scale16 are distinguished by their blocking size.

https://github.com/llvm/llvm-project/pull/169854