[PATCH] D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB

Thu Jan 5 02:21:58 PST 2017

v_klochkov added a comment.

> Does it make sense to convert 512-bit operation ADDSUB to FMADDSUB with all-ones multiplier? What ICC does?

It is interesting idea, but it is unlikely that it can be very beneficial.
ICC does not generate 512-bit FMADDSUB(a, 1, b) when it does not have ADDSUB(a,b) because it just does not need plain 512-bit ADDSUB operations.

ADDSUBs are usually needed for Complex MUL and DIV operations and there they are always accompanied by float/double MUL operations, i.e. ICC generates FMADDSUBs and it never needs plain ADDSUBs (at least for targets with AVX512, where FMADDSUB is available).

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:32057
+  const TargetOptions &Options = DAG.getTarget().Options;
+  bool AllowFusion =
+      (Options.AllowFPOpFusion == FPOpFusion::Fast || Options.UnsafeFPMath);
----------------
delena wrote:
> I see more checks in visitFADDForFMACombine. Can we take all these checks to a separate function? Something like isFMALegalAndProfitable().
Having such function is a good thing to have, but such function might be debatable and I do not want to complicate this change-set.
Perhaps such function should be added in a separate change-set.

Also, the function visitFADDForFMACombine() is shared for various target architectures, thus it needs much more checks.
The check here (in X86 part) seems sufficient.

================
Comment at: llvm/test/CodeGen/X86/fmaddsub-combine.ll:19
+  %Addsub = shufflevector <2 x double> %Sub, <2 x double> %Add, <2 x i32> <i32 0, i32 3>
+  ret <2 x double> %Addsub
+}
----------------
delena wrote:
> I don't understand the dependency you show in the test.
> I suppose that FMADDSUB test should be like this:
> 
>   %AB = fmul <2 x double> %A, %B
>   %Sub = fsub <2 x double> %AB, %C
>   %Add = fadd <2 x double> %AB, %C
>   %Addsub = shufflevector <2 x double> %Sub, <2 x double> %Add, <2 x i32> <i32 0, i32 3>
> 
> Do you check the "fast" attribute of the operation itself? Or you rely on global options only?
> 
This code relies on global options only. 
This is my first experience of work on SDNodes..., please fix me if I am wrong..., but SDNodes do not have FastMathFlags, right?

https://reviews.llvm.org/D28087