[Mlir-commits] [mlir] [mlir][AMDGPU] Add int4 intrinsics, mixed-type fp8 to handle gfx12 (PR #128963)
Daniel Hernandez-Juarez
llvmlistbot at llvm.org
Thu Feb 27 09:05:14 PST 2025
================
@@ -629,13 +633,17 @@ def AMDGPU_WMMAOp :
let summary = "MLIR wrapper for RDNA3 wmma instructions";
let description = [{
The `amdgpu.wmma` op is an MLIR wrapper around intrinsics
- for various `wmma` instructions in the RDNA3 architecture, which perform
- a 16x16 matrix multiplication for different data types.
+ for various `wmma` instructions in the RDNA3 or RDNA4 architecture, which
+ perform a 16x16 * 16x16 matrix multiplication for different data types.
+ Note that in gfx12/RDNA4, there is also a 16x32 * 32x16 instruction for 4-bit
+ integer inputs.
- When emitting f16->f16 (or bf16->bf16) wmma the output is a 16xf16 (or 16xbf16) vector
- containing only 8 valid values:
+ On gfx11/RDNA3, emitting f16->f16 (or bf16->bf16) wmma the output is a 16xf16
+ (or 16xbf16) vector containing only 8 valid values:
- If `subwordOffset` is 0, then the output is stored at indices 0, 2, 4, ..., 14.
- If `subwordOffset` is 1, then the output is stored at indices 1, 3, 5, ..., 15.
+ On gfx12/RDNA4, the result is instead returned as a vector<8 x f16/bf16> where
----------------
dhernandez0 wrote:
I think the output can be f32 or i32 as well?
https://github.com/llvm/llvm-project/pull/128963
More information about the Mlir-commits
mailing list