[Mlir-commits] [mlir] [mlir][x86vector] AVX Convert/Broadcast BF16 to F32 instructions (PR #135143)

Mon Apr 14 00:46:47 PDT 2025

================
@@ -408,4 +408,110 @@ def DotOp : AVX_LowOp<"dot", [Pure,
   }];
 }
 
+
+//----------------------------------------------------------------------------//
+// AVX: Convert packed BF16 even-indexed/odd-indexed elements into packed F32
+//----------------------------------------------------------------------------//
+
+def CvtPackedEvenIndexedBF16ToF32Op : AVX_Op<"cvt.packed.even.indexed.bf16_to_f32", [Pure,
+  DeclareOpInterfaceMethods<OneToOneIntrinsicOpInterface>]> {
+  let summary = "AVX: Convert packed BF16 even-indexed elements into packed F32 Data.";
+  let description = [{
+    #### From the Intel Intrinsics Guide:
+
+    Convert packed BF16 (16-bit) floating-point even-indexed elements stored at
+    memory locations starting at location `__A` to packed single-precision
+    (32-bit) floating-point elements, and store the results in `dst`.
+
+    Example:
+    ```mlir
+    %dst = x86vector.avx.cvt.packed.even.indexed.bf16_to_f32 %a : !llvm.ptr -> vector<8xf32>
+    ```
+  }];
+  let arguments = (ins LLVM_AnyPointer:$a);
+  let results = (outs VectorOfLengthAndType<[4, 8], [F32]>:$dst);
+  let assemblyFormat =
+    "$a  attr-dict`:` type($a)`->` type($dst)";
+
+  let extraClassDefinition = [{
+    std::string $cppClass::getIntrinsicName() {
+      std::string intr = "llvm.x86.vcvtneebf162ps";
+      VectorType vecType = getDst().getType();
+      unsigned elemBitWidth = vecType.getElementTypeBitWidth();
+      unsigned opBitWidth = vecType.getShape()[0] * elemBitWidth;
+      intr += std::to_string(opBitWidth);
+      return intr;
+    }
+  }];
+}
+
+def CvtPackedOddIndexedBF16ToF32Op : AVX_Op<"cvt.packed.odd.indexed.bf16_to_f32", [Pure,
+  DeclareOpInterfaceMethods<OneToOneIntrinsicOpInterface>]> {
+  let summary = "AVX: Convert packed BF16 odd-indexed elements into packed F32 Data.";
+  let description = [{
+    #### From the Intel Intrinsics Guide:
+
+    Convert packed BF16 (16-bit) floating-point odd-indexed elements stored at
+    memory locations starting at location `__A` to packed single-precision
+    (32-bit) floating-point elements, and store the results in `dst`.
+
+    Example:
+    ```mlir
+    %dst = x86vector.avx.cvt.packed.odd.indexed.bf16_to_f32 %a : !llvm.ptr -> vector<8xf32>
+    ```
+  }];
+  let arguments = (ins LLVM_AnyPointer:$a);
----------------
adam-smnk wrote:

Using LLVM types at this level is not really composable with anything - let's switch to memrefs.

Then it'd be best to make these ops easier to use i.e., guide users by design.
I'd suggest requiring memrefs with an exact shape to force users to ensure that memory accesses are safe to do. Although, if it's not convenient in practice, a simple base address could be used too (e.g., like in `amx.tile_load`).

https://github.com/llvm/llvm-project/pull/135143