[PATCH] D18751: [MachineCombiner] Support for floating-point FMA on ARM64

Mon Apr 4 21:01:55 PDT 2016

flyingforyou added a comment.

> sure, sorry I missed that. I looked at this too long, I guess :-). It is principally the same ‘better ILP' story as for integers. The prototypical idea is this: imagine two fmul operands feeding the fadd. When the two fmul can execute in parallel it can be faster to issue fmul, fmul, fadd rather than fmul, fmadd.

I think this opt's effect is depend on uarchitecture implementation. If some OoO uarchitectures can divide fmadd to small uops like fmul and fadd, this optimization is not worth for that kind of uarchitecture. (It's also not good for code-size. This means there is more overhead with instruction fetch.)

How about making flag for controling this optimization which is controled by uarch or core?

http://reviews.llvm.org/D18751