[PATCH] D28087: X86 instr selection: combine ADDSUB + MUL to FMADDSUB
Vyacheslav Klochkov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Jan 5 02:21:58 PST 2017
v_klochkov added a comment.
> Does it make sense to convert 512-bit operation ADDSUB to FMADDSUB with all-ones multiplier? What ICC does?
It is interesting idea, but it is unlikely that it can be very beneficial.
ICC does not generate 512-bit FMADDSUB(a, 1, b) when it does not have ADDSUB(a,b) because it just does not need plain 512-bit ADDSUB operations.
ADDSUBs are usually needed for Complex MUL and DIV operations and there they are always accompanied by float/double MUL operations, i.e. ICC generates FMADDSUBs and it never needs plain ADDSUBs (at least for targets with AVX512, where FMADDSUB is available).
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:32057
+ const TargetOptions &Options = DAG.getTarget().Options;
+ bool AllowFusion =
+ (Options.AllowFPOpFusion == FPOpFusion::Fast || Options.UnsafeFPMath);
----------------
delena wrote:
> I see more checks in visitFADDForFMACombine. Can we take all these checks to a separate function? Something like isFMALegalAndProfitable().
Having such function is a good thing to have, but such function might be debatable and I do not want to complicate this change-set.
Perhaps such function should be added in a separate change-set.
Also, the function visitFADDForFMACombine() is shared for various target architectures, thus it needs much more checks.
The check here (in X86 part) seems sufficient.
================
Comment at: llvm/test/CodeGen/X86/fmaddsub-combine.ll:19
+ %Addsub = shufflevector <2 x double> %Sub, <2 x double> %Add, <2 x i32> <i32 0, i32 3>
+ ret <2 x double> %Addsub
+}
----------------
delena wrote:
> I don't understand the dependency you show in the test.
> I suppose that FMADDSUB test should be like this:
>
> %AB = fmul <2 x double> %A, %B
> %Sub = fsub <2 x double> %AB, %C
> %Add = fadd <2 x double> %AB, %C
> %Addsub = shufflevector <2 x double> %Sub, <2 x double> %Add, <2 x i32> <i32 0, i32 3>
>
> Do you check the "fast" attribute of the operation itself? Or you rely on global options only?
>
This code relies on global options only.
This is my first experience of work on SDNodes..., please fix me if I am wrong..., but SDNodes do not have FastMathFlags, right?
https://reviews.llvm.org/D28087
More information about the llvm-commits
mailing list