[Mlir-commits] [mlir] [MLIR][NVVM] Add Op for TMA Store with reduction (PR #118853)
Guray Ozen
llvmlistbot at llvm.org
Wed Dec 11 05:32:50 PST 2024
================
@@ -2029,6 +2029,107 @@ def NVVM_CpAsyncBulkTensorPrefetchOp :
}];
}
+// List of modes supported for TMA Store and Reduction Ops
+def TMAStoreModeTile : I32EnumAttrCase<"TILE", 0, "tile">;
+def TMAStoreModeIm2Col : I32EnumAttrCase<"IM2COL", 1, "im2col">;
+
+def TMAStoreMode : I32EnumAttr<"TMAStoreMode", "NVVM TMA Store Mode",
+ [TMAStoreModeTile, TMAStoreModeIm2Col]> {
+ let genSpecializedAttr = 0;
+ let cppNamespace = "::mlir::NVVM";
+}
+def TMAStoreModeAttr : EnumAttr<NVVM_Dialect, TMAStoreMode, "tma_store_mode"> {
+ let assemblyFormat = "`<` $value `>`";
+}
+
+// List of Reduction Ops supported with TMA Store
+def TMAReduxKindAdd : I32EnumAttrCase<"ADD", 0, "add">;
+def TMAReduxKindMin : I32EnumAttrCase<"MIN", 1, "min">;
+def TMAReduxKindMax : I32EnumAttrCase<"MAX", 2, "max">;
+def TMAReduxKindInc : I32EnumAttrCase<"INC", 3, "inc">;
+def TMAReduxKindDec : I32EnumAttrCase<"DEC", 4, "dec">;
+def TMAReduxKindAnd : I32EnumAttrCase<"AND", 5, "and">;
+def TMAReduxKindOr : I32EnumAttrCase<"OR", 6, "or">;
+def TMAReduxKindXor : I32EnumAttrCase<"XOR", 7, "xor">;
+
+def TMAReduxKind : I32EnumAttr<"TMAReduxKind", "NVVM TMA redux kind",
+ [TMAReduxKindAdd, TMAReduxKindMax, TMAReduxKindMin,
+ TMAReduxKindInc, TMAReduxKindDec, TMAReduxKindAnd,
+ TMAReduxKindOr, TMAReduxKindXor]> {
+ let genSpecializedAttr = 0;
+ let cppNamespace = "::mlir::NVVM";
+}
+def TMAReduxKindAttr : EnumAttr<NVVM_Dialect, TMAReduxKind, "tma_redux_kind"> {
+ let assemblyFormat = "`<` $value `>`";
+}
+
+def NVVM_CpAsyncBulkTensorReduceOp :
----------------
grypp wrote:
The design philosophy for `NVGPU` and `NVVM` has long been clear and practical:
`NVGPU` was created to serve as a bridge between high-level dialects like `memref` and `vector` and the NVVM dialect.
`NVVM` was designed to generate PTX code, but this was never 1:1 mapping. It followed a `1:N` mapping, but this `N` is always the same PTX instruction with different traits. We've relied on this extensively.
`NVGPU` can generate `N` NVVM OP, and here `N` are distinct PTX instructions, e.g., tensor core : `fence + mma + commit + wait` chain.
Also, `NVGPU` is good place when something doesn’t fit neatly into GPU dialects—such as nvidia specific driver calls.
[[RFC] Add NV-GPU dialect](https://discourse.llvm.org/t/rfc-add-nv-gpu-dialect-hw-specific-extension-of-gpu-dialect-for-nvidia-gpus/61466)
https://github.com/llvm/llvm-project/pull/118853
More information about the Mlir-commits
mailing list