[Mlir-commits] [mlir] [mlir][x86vector] AVX512-BF16 Dot op (PR #124800)

Tue Jan 28 14:15:23 PST 2025

================
@@ -271,6 +271,94 @@ def Vp2IntersectQIntrOp : AVX512_IntrOp<"vp2intersect.q.512", 2, [
                    VectorOfLengthAndType<[8], [I64]>:$b);
 }
 
+//===----------------------------------------------------------------------===//
+// AVX512-BF16 op definitions
+//===----------------------------------------------------------------------===//
+
+// Operation that is part of the input dialect.
+class AVX512BF16_Op<string mnemonic, list<Trait> traits = []> :
+  Op<X86Vector_Dialect, "avx512bf16." # mnemonic, traits> {}
+
+// Intrinsic operation used during lowering to LLVM IR.
+class AVX512BF16_IntrOp<string mnemonic, int numResults, list<Trait> traits = []> :
+  LLVM_IntrOpBase<X86Vector_Dialect, "avx512bf16.intr." # mnemonic,
+                  "x86_avx512bf16_" # !subst(".", "_", mnemonic),
+                  [], [], traits, numResults>;
+
+// Defined by first result overload. May have to be extended for other
+// instructions in the future.
+class AVX512BF16_IntrOverloadedOp<string mnemonic,
+                              list<Trait> traits = []> :
+  LLVM_IntrOpBase<X86Vector_Dialect, "avx512bf16.intr." # mnemonic,
+                  "x86_avx512bf16_" # !subst(".", "_", mnemonic),
+                  /*list<int> overloadedResults=*/[0],
+                  /*list<int> overloadedOperands=*/[],
+                  traits, /*numResults=*/1>;
+
+//----------------------------------------------------------------------------//
+// AVX512-BF16 Dot
+//----------------------------------------------------------------------------//
+
+def DotBF16Op : AVX512BF16_Op<"dot", [Pure,
+  AllTypesMatch<["a", "b"]>,
+  AllTypesMatch<["src", "dst"]>,
+  TypesMatchWith<"`a` has twice an many elements as `src`",
+                 "src", "a",
+                 "VectorType::get({::llvm::cast<VectorType>($_self).getShape()[0] * 2}, "
+                 "BFloat16Type::get($_self.getContext()))">]> {
+  let summary = "Dot BF16 op";
----------------
Groverkss wrote:

Usually, on AMDGPU and NVGPU, we use two seperate dialects for high level operations and intrinsic operations. It maintains a clear distinction of what is a mirror of a llvm intrinsic and what is an op which is a wrapper that can be used. Example AMDGPU and ROCDL dialect: 

https://mlir.llvm.org/docs/Dialects/AMDGPU/ (Wrappers)
https://mlir.llvm.org/docs/Dialects/ROCDLDialect/ (LLVM intrinsics)

I would recommend thinking about having a seperate wrapper dialect in future. I get it's a lot of effort, but from experience, it's useful. When lowering from higher abstraction targets, we try using amdgpu dialect, but when using a lower level assembly/llvm emitter, we use rocdl.

If you don't want to do that, I would still recommend creating a seperate doc group in tablegen so that when you open the dialect page on mlir, you know which ones are wrappers and which ones are llvm intrinsics. It's also an easier seperation to make.

Not blocking on a seperate dialect, but i would ask for a seperate doc group for wrapper ops and llvm mirror intrinsic ops.




https://github.com/llvm/llvm-project/pull/124800