[clang] [llvm] [X86][AMX] Support AMX-FP8 (PR #113850)

Tue Oct 29 19:03:12 PDT 2024

================
@@ -0,0 +1,83 @@
+/*===------------- amxfp8intrin.h - AMX intrinsics -*- C++ -*----------------===
+ *
+ * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+ * See https://llvm.org/LICENSE.txt for license information.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ *
+ *===------------------------------------------------------------------------===
+ */
+
+#ifndef __IMMINTRIN_H
+#error "Never use <amxfp8intrin.h> directly; include <immintrin.h> instead."
+#endif /* __IMMINTRIN_H */
+
+#ifndef __AMXFP8INTRIN_H
+#define __AMXFP8INTRIN_H
+#ifdef __x86_64__
+
+
+/// Compute dot-product of brain-float8 (BF8) or hybrid-float8 (HF8)
+///    floating-point pairs in tiles \a a and \a b, accumulating the
+///    intermediate single-precision (32-bit) floating-point elements with
+///    elements in \a dst, and store the 32-bit result back to tile \a dst.
+///
+/// \headerfile <immintrin.h>
+///
+/// \code
+/// void _tile_dpbf8ps (__tile dst, __tile a, __tile b)
+/// \endcode
+///
+/// This intrinsic corresponds to the \c TDPBF8PS instruction.
+///
+/// \param dst
+///    The destination tile. Max size is 1024 Bytes.
+/// \param a
+///    The 1st source tile. Max size is 1024 Bytes.
+/// \param b
+///    The 2nd source tile. Max size is 1024 Bytes.
+#define _tile_dpbf8ps __builtin_ia32_tdpbf8ps
----------------
phoebewang wrote:

`#define _tile_dpbf8ps(dst, a, b) __builtin_ia32_tdpbf8ps((dst), (a), (b))`
The same below.

https://github.com/llvm/llvm-project/pull/113850