[PATCH] D116015: [PowerPC] Add generic fnmsub intrinsic

Tue Feb 22 01:45:11 PST 2022

qiucf added a comment.

In D116015#3326148 <https://reviews.llvm.org/D116015#3326148>, @shchenz wrote:

>> hiding the semantics from the optimizer is sometimes a good thing and sometimes a bad thing).
>
> Agree. Imagining a case when the neg and fma (from fnmsub) can both be CSE-ed with another neg and fma, so we can totally eliminate the fnmsub. But after we convert it to an intrinsic, we may lose the opportunity to CSE the fnmsub.
>
>> Here's a pretty simple case: vector float foo(vector float a, vector float b, vector float c, vector float d) { return __builtin_vsx_xvnmsubasp(c, d, a*b); }
>> It current produces xvnegsp+xvmulsp+xvnmaddasp, after this patch it produces xvmulsp+xvnmsubasp. In some complicated cases, we can see much more unexpected instructions generated.
>
> This is narrowed down from a real-world case. After CSE some part of the fnmsub, it is hard to optimize it back to a single hardware fnmsub instruction as normally we check the use number of a register and if the user number is not 1, we may exit the combine.
>
> Is it possible to get some perf data for some float workloads with this patch? @qiucf

Thanks. I did not see performance change in some common benchmarks.

================
Comment at: llvm/include/llvm/IR/IntrinsicsPowerPC.td:1737
+                  [LLVMMatchType<0>, LLVMMatchType<0>, LLVMMatchType<0>],
+                  [IntrNoMem]>;
   def int_ppc_fre
----------------
shchenz wrote:
> When `llvm_anyfloat_ty` is `f32` or `f64`, we will generate two intrinsics with same semantic. `llvm.ppc.nmsub.f32` + `llvm.ppc.fnmsubs` and `llvm.ppc.nmsub.f64` + `llvm.ppc.fnmsub`. At first glance, we seems can not delete the `int_ppc_fnmsub` and `int_ppc_fnmsubs`, because they are for XL compatibility and XL has seperated fnmsub for float and double and we need to map them 1 by 1. Better to check if it is possible to replace `int_ppc_fnmsub` and `int_ppc_fnmsubs` with `int_ppc_nmsub`. And if it can be replaced, we can use a meaningful name like `int_ppc_fnmsub` for the new intrinsic.
We can do that, but that requires more work and seems beyond this patch's scope. See D105930, we'll need to handle the builtin in Clang. And the builtin explicitly generates type-M VSX instructions (I guess to reduce copy in simple cases).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116015/new/

https://reviews.llvm.org/D116015