[PATCH] D132329: [X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics

Phoebe Wang via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Oct 19 06:39:11 PDT 2022


pengfei added inline comments.


================
Comment at: clang/lib/Headers/avx512bf16intrin.h:13
 
+#ifdef __SSE2__
+
----------------
LuoYuanke wrote:
> What is this macro check used for?
`__bf16` is not available without SSE2. This is to make sure no error generated if user include <immintrin.h> 


================
Comment at: clang/test/CodeGen/X86/avx512bf16-error.c:14
+__bfloat16 bar(__bfloat16 a, __bfloat16 b) {
+  return a + b;
+}
----------------
LuoYuanke wrote:
> Need test for other operations (-, *, /) as well?
I don't think so. This is to check `__bfloat16` is deprecated. We should check them when enabling `__bf16` on SSE2.


================
Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:4928
               Intrinsic<[llvm_v4f32_ty],
               [llvm_v4f32_ty, llvm_v4i32_ty, llvm_v4i32_ty], [IntrNoMem]>;
   def int_x86_avx512bf16_dpbf16ps_256:
----------------
LuoYuanke wrote:
> It seems we still use i32 to represent <2 x bf16>, but we don't have a better way since 1 bit mask cover a pair of bf16 elements.
I think mask is not an issue because both the passthru and dst are <4 x float>.


================
Comment at: llvm/lib/IR/AutoUpgrade.cpp:4095
+        Intrinsic::x86_avx512bf16_mask_cvtneps2bf16_128)
+      Args[1] = Builder.CreateBitCast(
+          Args[1], FixedVectorType::get(Builder.getBFloatTy(), NumElts));
----------------
LuoYuanke wrote:
> Why there is no bitcast for the input for the other intrinsics? I expect to see the bitcast from vXi16 to vXbf16.
Others don't have vXbf16 in inputs.


================
Comment at: llvm/lib/Target/X86/X86InstrAVX512.td:3916
+multiclass mask_move_lowering_f16_bf16<AVX512VLVectorVTInfo _> {
 let Predicates = [HasBWI] in {
+  def : Pat<(_.info512.VT (vselect VK32WM:$mask, (_.info512.VT VR512:$src1), (_.info512.VT VR512:$src0))),
----------------
LuoYuanke wrote:
> Not sure the indent is correct or not.
The format is chaos in td files, at least we have code using in this way :)


================
Comment at: llvm/test/CodeGen/X86/bfloat.ll:32
+; BF16-NEXT:    shll $16, %eax
+; BF16-NEXT:    vmovd %eax, %xmm1
+; BF16-NEXT:    vaddss %xmm0, %xmm1, %xmm0
----------------
LuoYuanke wrote:
> It seems the difference between SSE2 and BF16 is using SSE instruction or AVX instruction. What do we expect to test for BF16?
This is to make sure the scalar type works under AVX512-BF16. We may optimize it with `vcvtneps2bf16` in future.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132329/new/

https://reviews.llvm.org/D132329



More information about the cfe-commits mailing list