[PATCH] D18751: [MachineCombiner] Support for floating-point FMA on ARM64

Mon Apr 4 23:45:58 PDT 2016

> On Apr 4, 2016, at 9:01 PM, Junmo Park <junmoz.park at samsung.com> wrote:
> 
> flyingforyou added a comment.
> 
>> sure, sorry I missed that. I looked at this too long, I guess :-). It is principally the same ‘better ILP' story as for integers. The prototypical idea is this: imagine two fmul operands feeding the fadd. When the two fmul can execute in parallel it can be faster to issue fmul, fmul, fadd rather than fmul, fmadd.
> 
> 
> I think this opt's effect is depend on uarchitecture implementation. If some OoO uarchitectures can divide fmadd to small uops like fmul and fadd, this optimization is not worth for that kind of uarchitecture. (It's also not good for code-size. This means there is more overhead with instruction fetch.)
The optimization does not kick when code size is the objective.
> 
> How about making flag for controling this optimization which is controled by uarch or core?
So far I have seen gains on at least 3 uArch. As far as I can tell there is no reason to be concerned about compile-time or performance losses. But we can always add a flag when the need arises.
> 
> 
> http://reviews.llvm.org/D18751
> 
> 
>