[Mlir-commits] [mlir] [MLIR][NVVM] Add tcgen05.mma MLIR Ops (PR #164356)
Guray Ozen
llvmlistbot at llvm.org
Mon Oct 20 22:59:10 PDT 2025
================
@@ -4537,6 +4537,645 @@ def NVVM_ClusterLaunchControlQueryCancelOp
}];
}
+//===----------------------------------------------------------------------===//
+// NVVM tcgen05.mma attributes
+//===----------------------------------------------------------------------===//
+
+def Tcgen05MMAKindF16 : I32EnumAttrCase<"F16", 0, "f16">;
+def Tcgen05MMAKindTF32 : I32EnumAttrCase<"TF32", 1, "tf32">;
+def Tcgen05MMAKindF8F6F4 : I32EnumAttrCase<"F8F6F4", 2, "f8f6f4">;
+def Tcgen05MMAKindINT8 : I32EnumAttrCase<"I8", 3, "i8">;
+
+def Tcgen05MMAKind : I32EnumAttr<
+ "Tcgen05MMAKind",
+ "tcgen05 MMA Supported Types",
+ [Tcgen05MMAKindF8F6F4, Tcgen05MMAKindINT8, Tcgen05MMAKindF16,
+ Tcgen05MMAKindTF32]> {
+ let cppNamespace = "::mlir::NVVM";
+ let genSpecializedAttr = 0;
+}
+
+def Tcgen05MMAKindAttr : EnumAttr<NVVM_Dialect, Tcgen05MMAKind, "tcgen05_mma_kind"> {
+ let assemblyFormat = "`<` $value `>`";
+}
+
+def Tcgen05MMACollectorOpDiscard : I32EnumAttrCase<"DISCARD", 0, "discard">;
+def Tcgen05MMACollectorOpLastUse : I32EnumAttrCase<"LASTUSE", 1, "lastuse">;
+def Tcgen05MMACollectorOpFill : I32EnumAttrCase<"FILL", 2, "fill">;
+def Tcgen05MMACollectorOpUse : I32EnumAttrCase<"USE", 3, "use">;
+
+def Tcgen05MMACollectorOp : I32EnumAttr<
+ "Tcgen05MMACollectorOp",
+ "tcgen05.mma Collector Buffer Operation",
+ [Tcgen05MMACollectorOpDiscard,
+ Tcgen05MMACollectorOpLastUse,
+ Tcgen05MMACollectorOpFill,
+ Tcgen05MMACollectorOpUse]> {
+ let cppNamespace = "::mlir::NVVM";
+ let genSpecializedAttr = 0;
+}
+
+def Tcgen05MMACollectorOpAttr : EnumAttr<NVVM_Dialect, Tcgen05MMACollectorOp, "tcgen05_mma_collectorop"> {
+ let assemblyFormat = "`<` $value `>`";
+}
+
+//===----------------------------------------------------------------------===//
+// NVVM tcgen05.mma Ops.
+//===----------------------------------------------------------------------===//
+
+def NVVM_Tcgen05MMAOp : NVVM_Op<"tcgen05.mma", [AttrSizedOperandSegments]> {
+
+ let summary = "Performs MMA operation on 5th-gen tensor cores";
+
+ let arguments = (ins
+ // Attributes
+ Tcgen05MMAKindAttr:$kind,
+ CTAGroupKindAttr:$ctaGroup,
+ DefaultValuedAttr<Tcgen05MMACollectorOpAttr,
+ "Tcgen05MMACollectorOp::DISCARD">:$collectorOp,
+ UnitAttr:$ashift,
+ // Arguments
+ LLVM_PointerTensor:$d,
+ AnyTypeOf<[LLVM_PointerTensor, I64]>:$a,
+ I64:$b,
+ I32:$idesc,
+ I1:$enableInputD,
+ // Optional arguments
+ Optional<I64>:$scaleInputD,
+ Optional<FixedVectorOfLengthAndType<[4, 8], [I32]>>:$disableOutputLane
+ );
+
+ let description = [{
+ The `tcgen05.mma` is an asynchronous op which performs matrix multiplication,
+ and accumulation using 5th generation tensor cores
+
+ ```
+ D = A * B + (D * 2^ -scaleInputD) // if `scaleInputD` is provided
+ D = A * B // if `enableInputD` is false
+ D = A * B + D // otherwise
+ ```
+
+ where:
+ - A is an `M x K` matrix in tensor memory or described using shared memory descriptor
+ - B is a `K x N` matrix described using shared memory descriptor
+ - D is an `M x N` accumulator matrix in tensory memory
+
+ `shared memory descriptor` is a 64 bit value which describes the properties
+ of multiplicand matrix in shared memory including its location in the shared
+ memory of the current CTA. For more details, please refer the
+ [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-shared-memory-descriptor)
+
+ - idesc is a 32-bit value representing the [Instruction Descriptor](https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-instruction-descriptor)
+
+ Optional Operands:
+ - `scaleInputD` is an Immediate value operand used for scaling D matrix by 2 ^ (-scaleInputD). The valid range is [0, 15]
+
+ - `disableOutputLane` is a vector mask for selective output
+ * vector<4 x i32> when ctaGroup is CTA_1
+ * vector<8 x i32> when ctaGroup is CTA_2
+
+ Required Attributes:
+ - `kind` specifies the computation data type and precision
+ * f16 : 16-bit floating point (half precision)
+ * tf32 : Tensor Float 32 (truncated 32-bit float)
+ * f8f6f4 : Mixed precision FP8/FP6/FP4
+ * i8 : 8-bit integer operations
+
+ - `ctaGroup` specifies CTA group configuration
+ * cta_1: MMA will be performed on the current thread's CTA
+ * cta_2: MMA will be performed on the current thread and it's peer CTA
+
+ Default Attributes:
+ - collectorOp specifies the collector buffer operations for matrix A
+ * discard : Release buffer after use (default)
+ * lastuse : Mark buffer for last use
+ * fill : Fill buffer
+ * use : Use buffer without modification
+
+ - `ashift` shifts the rows of the A matrix down by one row
+
+ [For more information, see PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/#tcgen05-mma-instructions-mma)
+ }];
+
+ let assemblyFormat = [{
+ $d `,` $a `,` $b `,` $idesc `,` $enableInputD (`scale` `=` $scaleInputD^)?
+ (`mask` `=` $disableOutputLane^)? attr-dict `:` `(` type(operands) `)`
+ }];
+
+ let extraClassDeclaration = [{
+ static mlir::NVVM::IDArgPair getIntrinsicIDAndArgs(
+ Operation &op, LLVM::ModuleTranslation &mt,
+ llvm::IRBuilderBase &builder);
+ }];
+
+ let llvmBuilder = [{
+ auto [ID, args] = NVVM::Tcgen05MMAOp::getIntrinsicIDAndArgs(
+ *op, moduleTranslation, builder);
+ createIntrinsicCall(builder, ID, args);
+ }];
+
+ let hasVerifier = true;
+}
+
+def NVVM_Tcgen05MMASpOp : NVVM_Op<"tcgen05.mma.sp", [AttrSizedOperandSegments]> {
+
+ let summary = "Performs MMA operation with sparse A matrix on 5th-gen tensor cores";
+
+ let arguments = (ins
+ // Attributes
+ Tcgen05MMAKindAttr:$kind,
+ CTAGroupKindAttr:$ctaGroup,
+ DefaultValuedAttr<Tcgen05MMACollectorOpAttr,
+ "Tcgen05MMACollectorOp::DISCARD">:$collectorOp,
+ UnitAttr:$ashift,
+ // Arguments
----------------
grypp wrote:
remove the comments, it's clear what is Attribute what is argument
https://github.com/llvm/llvm-project/pull/164356
More information about the Mlir-commits
mailing list