[PATCH] D105268: [X86] AVX512FP16 instructions enabling 5/6

Thu Aug 19 08:23:07 PDT 2021

craig.topper added inline comments.

================
Comment at: clang/include/clang/Basic/BuiltinsX86.def:2014
+
+TARGET_BUILTIN(__builtin_ia32_vfmaddsh3_mask, "V8xV8xV8xV8xUcIi", "ncV:128:", "avx512fp16")
+TARGET_BUILTIN(__builtin_ia32_vfmaddsh3_maskz, "V8xV8xV8xV8xUcIi", "ncV:128:", "avx512fp16")
----------------
LuoYuanke wrote:
> What does "3" stand for?
The 3 is there because AMD's 4 operand fma used vfmaddss/vfmaddsd. So Intel's 3 operand used vfmaddss3/vfmaddsd3. That naming is being carried forward here.

================
Comment at: clang/lib/Headers/avx512vlfp16intrin.h:1385
+                                                                  __m128h __C) {
+  return (__m128h)__builtin_ia32_selectph_128(
+      (__mmask8)__U,
----------------
LuoYuanke wrote:
> Sorry, I'm confused sometimes we use mask builtin, sometimes we use select builtin. Any guideline on it?
Ideally FP should never use select because it doesn't convey that exceptions should be masked for strictfp. But the mistake was already made for add/sub/mul/div/fma/etc years ago before strictfp support existed in llvm. fp16 is intentionally following float/double for consistency.

================
Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:5709
+
+  def int_x86_avx512fp16_vfmadd_ph_512
+      : Intrinsic<[ llvm_v32f16_ty ],
----------------
LuoYuanke wrote:
> I notice there is no builtin bound to this intrinsic. What is it used for?
It is manually selected in CGBuiltin.cpp

================
Comment at: llvm/include/llvm/IR/IntrinsicsX86.td:5727
+                  [ IntrNoMem, ImmArg<ArgIndex<3>> ]>;
+  def int_x86_avx512fp16_vfmadd_f16
+      : Intrinsic<[ llvm_half_ty ],
----------------
LuoYuanke wrote:
> ph?
It's scalar so it shouldn't be ph. This matches int_x86_avx512_vfmadd_f32 and int_x86_avx512_vfmadd_f64. They don't use ss/sd because the ss/sd names are usually used for intrinsics that 128-bit operands and only modify the lower element. int_x86_avx512_vfmadd_f32 and int_x86_avx512_vfmadd_f64 have float/double inputs and produce float/double results.

================
Comment at: llvm/test/CodeGen/X86/avx512fp16-fma-commute.ll:9
+
+define half @fma_123_f16(half %x, half %y, half %z) {
+; CHECK-LABEL: fma_123_f16:
----------------
LuoYuanke wrote:
> The name 123 is not the same with the generated instruction (213sh). Is it expected?
123 represents how the 3 arguments to the fucntion are mapped to the 3 intrinsic arguments that it calls. There are 6 possible permutations which are all tested here, but only 3 instruction mnemonics.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D105268/new/

https://reviews.llvm.org/D105268