[PATCH] D109658: [X86][FP16] Change the order of the operands in complex FMA intrinsics to allow swap between the mul operands.

Mon Sep 13 07:11:07 PDT 2021

pengfei added a comment.

In D109658#2996767 <https://reviews.llvm.org/D109658#2996767>, @craig.topper wrote:

> In D109658#2996714 <https://reviews.llvm.org/D109658#2996714>, @pengfei wrote:
>
>> In D109658#2996412 <https://reviews.llvm.org/D109658#2996412>, @craig.topper wrote:
>>
>>> Does gcc use the same builtin name? Our general policy is to have the same interface as gcc if we have a builtin. So if gcc has these builtins the should work the same way.
>>
>> No. We don't sync with GCC on the builtin name during the development. We had a disscussion and decided to not keep them aligned due to 1) target specific builtins are compiler private names that no need to keep it compatible with other compilers; and 2) we already differentiate the target builtins with GCC long ago on the naming, masking etc. Currently, regardless the name, GCC uses the same C, A, B order with our existing implementation. https://gitlab.com/x86-gcc/gcc/-/blob/users/intel/liuhongt/independentfp16_wip/gcc/config/i386/avx512fp16intrin.h#L6672
>
> I thought we were pretty consistent on names with gcc for most of sse and avx and most of avx512. The names aren't completely private occasionally users due try to use them. If we happen to have the same name we should have the same behavior to avoid confusion.

I'm not so optimistic. I had a coarse-grained statistics on the use of x86 builtins in Clang and GCC. It shows Clang only defines 2/3 of GCC's builtins and 1/4 of Clang builtins have different names with GCC's. Command below:

  cat gcc/config/i386/*.h | grep -o "\b__builtin_ia32_\w\+" |sort|uniq|tee gcc.txt|wc -l
  2788

  ls clang/lib/Headers/*.h |grep -v fp16 |xargs cat |grep -o "\b__builtin_ia32_\w\+" |sort|uniq|tee clang.txt|wc -l
  1808

  comm -12 gcc.txt clang.txt |wc -l
  1347

Regarding this case, we already have a different name with GCC, I think it worthwhile to use a different order for the swapping optimization.
With a bit research on AVX512IFMA, I found:

1. The use of C, A, B order in GCC is not consistent on its AVX512IFMA builtins. It supposes GCC should change to A, B, C order if considering consistency;
2. We aren't consistent on AVX512IFMA builtins with GCC either due to the use of select.

By the way, GCC folks told me GCC has ability to specify arbitrary operands that can be commutative. But I found both SDNode and MI only have ability on the first 2 operands, which is insufficient for instruction like CFMA. Do you know if we have other mechanism for commutable operands?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D109658/new/

https://reviews.llvm.org/D109658