[Mlir-commits] [mlir] [mlir][x86vector] AVX512-BF16 Dot op (PR #124800)

Tue Jan 28 14:33:59 PST 2025

================
@@ -271,6 +271,94 @@ def Vp2IntersectQIntrOp : AVX512_IntrOp<"vp2intersect.q.512", 2, [
                    VectorOfLengthAndType<[8], [I64]>:$b);
 }
 
+//===----------------------------------------------------------------------===//
+// AVX512-BF16 op definitions
+//===----------------------------------------------------------------------===//
+
+// Operation that is part of the input dialect.
+class AVX512BF16_Op<string mnemonic, list<Trait> traits = []> :
+  Op<X86Vector_Dialect, "avx512bf16." # mnemonic, traits> {}
+
+// Intrinsic operation used during lowering to LLVM IR.
+class AVX512BF16_IntrOp<string mnemonic, int numResults, list<Trait> traits = []> :
+  LLVM_IntrOpBase<X86Vector_Dialect, "avx512bf16.intr." # mnemonic,
+                  "x86_avx512bf16_" # !subst(".", "_", mnemonic),
+                  [], [], traits, numResults>;
+
+// Defined by first result overload. May have to be extended for other
+// instructions in the future.
+class AVX512BF16_IntrOverloadedOp<string mnemonic,
+                              list<Trait> traits = []> :
+  LLVM_IntrOpBase<X86Vector_Dialect, "avx512bf16.intr." # mnemonic,
+                  "x86_avx512bf16_" # !subst(".", "_", mnemonic),
+                  /*list<int> overloadedResults=*/[0],
+                  /*list<int> overloadedOperands=*/[],
+                  traits, /*numResults=*/1>;
+
+//----------------------------------------------------------------------------//
+// AVX512-BF16 Dot
+//----------------------------------------------------------------------------//
+
+def DotBF16Op : AVX512BF16_Op<"dot", [Pure,
+  AllTypesMatch<["a", "b"]>,
+  AllTypesMatch<["src", "dst"]>,
+  TypesMatchWith<"`a` has twice an many elements as `src`",
+                 "src", "a",
+                 "VectorType::get({::llvm::cast<VectorType>($_self).getShape()[0] * 2}, "
+                 "BFloat16Type::get($_self.getContext()))">]> {
+  let summary = "Dot BF16 op";
----------------
rengolin wrote:

CPU extension dialects are different than the GPU dialects in that respect. I agree we need to design those better, but I'm not sure different dialects are the right path here. I also recently mentioned to @banach-space about joining the existing Arm dialects into one to avoid dialect zoo.

I'd ask to avoid trying to fix the `x86vector` dialect or in general CPU dialects until we have something that we know works. No one really uses it now and we're trying to resurrect it. So, let's just follow the patterns that are already baked in and see what comes out, have some real usage out of it and then form a plan to fix the design.

Right now, I don't think it even makes sense to try to match the CPU dialects in the shape of the GPU ones.

https://github.com/llvm/llvm-project/pull/124800