[Mlir-commits] [mlir] [mlir][AMDGPU] Add scaled wmma ops for gfx1250 (PR #169854)
Krzysztof Drewniak
llvmlistbot at llvm.org
Mon Dec 1 09:34:28 PST 2025
================
@@ -1218,6 +1227,56 @@ def AMDGPU_ScaledMFMAOp :
let hasCanonicalizer = 1;
}
+def AMDGPU_ScaledWMMAOp
+ : AMDGPU_Op<"scaled_wmma", [AllTypesMatch<["destC", "destD"]>, Pure]>,
+ Arguments<(ins ConfinedAttr<I32Attr, [IntIsOneOf<[16, 32]>]>:$m,
+ ConfinedAttr<I32Attr, [IntIsOneOf<[16]>]>:$n,
+ ConfinedAttr<I32Attr, [IntIsOneOf<[128]>]>:$k,
+ ScaledWMMAInTypes:$sourceA, ScaledWMMAInTypes:$sourceB,
+ ScaledWMMAOutTypes:$destC,
+ VectorOfLengthAndType<[4, 8], [F8E8M0FNU, F8E4M3FN]>:$scaleA,
+ ConfinedAttr<I32Attr, [IntIsOneOf<[0, 1]>]>:$scaleAIdx,
+ VectorOfLengthAndType<[4, 8], [F8E8M0FNU, F8E4M3FN]>:$scaleB,
+ ConfinedAttr<I32Attr, [IntIsOneOf<[0, 1]>]>:$scaleBIdx)>,
+ Results<(outs ScaledWMMAOutTypes:$destD)> {
+ let summary = "MLIR wrapper for scaled wmma instructions";
+ let description = [{
+ The `amdgpu.scaled_wmma` op is an MLIR wrapper around intrinsics for scaled
+ `wmma` instructions. These instructions perform matrix multiplication with
+ per-block scaling of inputs, supporting fp4, fp6, and fp8 data formats.
+
+ The scale instructions support two tile sizes:
+ - 16x16x128 with mixed f8/f6/f4 formats (output: vector<4xf32>)
+ - 32x16x128 with f4 format only (output: vector<8xf32>)
+
+ Scale parameters (`scaleA`, `scaleB`) are small vectors of f8 scale values
+ (either f8E8M0FNU, or f8E4M3FN). The index attributes (`scaleAIdx`, `scaleBIdx`)
+ select which element from the scale vector to use for scaling. During lowering,
----------------
krzysz00 wrote:
I don't think this is correct
Remember that the layout of the scales starts (at least for one of the vaules of this bit)
```
lane 0: (m=0, kOuter=0) (0, 1), (0, 2), (0, 3) ...
```
and we should really write the formulas out
https://github.com/llvm/llvm-project/pull/169854
More information about the Mlir-commits
mailing list