[PATCH] D132329: [X86][RFC] Using `__bf16` for AVX512_BF16 intrinsics

Fri Aug 26 07:41:22 PDT 2022

pengfei added inline comments.

================
Comment at: clang/lib/AST/MicrosoftMangle.cpp:2472

+  case BuiltinType::BFloat16:
+    mangleArtificialTagType(TTK_Struct, "__bf16", {"__clang"});
----------------
LuoYuanke wrote:
> This looks irrelative to the patch.
The use of `__bf16` in intrinsics leads to new lit fails due to no mangling support on Windows. But I can do it in a separate patch.

================
Comment at: clang/test/CodeGen/X86/avx512bf16-builtins.c:7

-float test_mm_cvtsbh_ss(__bfloat16 A) {
-  // CHECK-LABEL: @test_mm_cvtsbh_ss
-  // CHECK: zext i16 %{{.*}} to i32
-  // CHECK: shl i32 %{{.*}}, 16
-  // CHECK: bitcast i32 %{{.*}} to float
+float test_mm_cvtsbh_ss(__bf16 A) {
+  // CHECK: fpext bfloat %{{.*}} to float
----------------
LuoYuanke wrote:
> Add a test case for `__bfloat16` to test compatibility?
GCC folks prefer to not providing `__bfloat16`, but I'd like to deprecate it first. So we don't need test for it.

================
Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:4904
               ClangBuiltin<"__builtin_ia32_cvtne2ps2bf16_128">,
-              Intrinsic<[llvm_v8i16_ty], [llvm_v4f32_ty, llvm_v4f32_ty],
+              Intrinsic<[llvm_v8bf16_ty], [llvm_v4f32_ty, llvm_v4f32_ty],
               [IntrNoMem]>;
----------------
LuoYuanke wrote:
> Probably need to upgrade the old intrinsics to new version for IR compatibility or we can keep IR unchanged and just generate bitcast from bfloat16 to i16.
Good suggestion!

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:2180
+  if (!Subtarget.useSoftFloat() && Subtarget.hasBF16()) {
+    addRegisterClass(MVT::bf16, &X86::FR16XRegClass);
+    addRegisterClass(MVT::v8bf16, &X86::VR128XRegClass);
----------------
LuoYuanke wrote:
> Not sure about this. Does it make bf16 legal type?
Good catch! I made it legal to lower `BUILD_VECTOR`. But yes, it results in the scalar lowering failing with AVX512BF16.
I fixed the problem by adding customized code. It works for both scalar lowering and AVX512BF16 intrinsics lowering now.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132329/new/

https://reviews.llvm.org/D132329