[Mlir-commits] [mlir] [mlir][amdgpu] Add explicit intrinsic shape to wmma (PR #164920)
Kunwar Grover
llvmlistbot at llvm.org
Thu Oct 23 23:12:58 PDT 2025
================
@@ -990,28 +999,32 @@ def AMDGPU_WMMAOp :
UnitAttr:$unsignedB,
UnitAttr:$clamp)>,
Results<(outs WMMAOutTypes: $destD)> {
- let summary = "MLIR wrapper for RDNA3 wmma instructions";
+ let summary = "MLIR wrapper for wmma instructions";
let description = [{
- The `amdgpu.wmma` op is an MLIR wrapper around intrinsics
- for various `wmma` instructions in the RDNA3 or RDNA4 architecture, which
- perform a 16x16 * 16x16 matrix multiplication for different data types.
- Note that in gfx12/RDNA4, there is also a 16x32 * 32x16 instruction for 4-bit
- integer inputs.
+ The `amdgpu.wmma` op is an MLIR wrapper around intrinsics for various `wmma`
+ instructions in the AMDGPU architecture, which perform matrix multiplication.
+ Note that all wmma intrinsics have M=N=16 dimensions but vary by in allowed K
+ dimensions.
On gfx11/RDNA3, emitting f16->f16 (or bf16->bf16) wmma the output is a 16xf16
(or 16xbf16) vector containing only 8 valid values:
- If `subwordOffset` is 0, then the output is stored at indices 0, 2, 4, ..., 14.
- If `subwordOffset` is 1, then the output is stored at indices 1, 3, 5, ..., 15.
- On gfx12/RDNA4, the result is instead returned as a vector<8 x f16/bf16> where
- all values are valid and the `subwordOffset` must be `0`, as it cannot be used.
+ On gfx12/RDNA4 and gfx1250, the result is instead returned as vector where all
+ the values are valid and the `subwordOffset` must be `0`, as it cannot be used.
`unsignedA` and `unsignedB` flag that the `int8` LLVM inputs are unsigned.
- The `clamp` flag is used to saturate the output of type T to numeric_limits<T>::max()
+ The `clamp` flag is used to saturate the output of type T to `numeric_limits<T>::max()`
in case of overflow.
+
+ Example:
+ ```mlir
+ %0 = amdgpu.wmma 16x16x16 %matA * %matB + %matC : vector<16xf16>, vector<16xf16>, vector<8xf16>
----------------
Groverkss wrote:
While the syntax is okay, it is weird that the mfma instructions encode this stuff as an attribute dict while wmma does it as a custom parser
https://github.com/llvm/llvm-project/pull/164920
More information about the Mlir-commits
mailing list