[PATCH] D18751: [MachineCombiner] Support for floating-point FMA on ARM64

Mon Apr 4 13:26:26 PDT 2016

Hi James,

sure, sorry I missed that. I looked at this too long, I guess :-). It is principally the same ‘better ILP' story as for integers. The prototypical idea is this: imagine two fmul operands feeding the fadd. When the two fmul can execute in parallel it can be faster to issue fmul, fmul, fadd rather than fmul, fmadd. 

Cheers
Gerolf
> On Apr 4, 2016, at 4:58 AM, James Molloy <james.molloy at arm.com> wrote:
> 
> jmolloy added a subscriber: jmolloy.
> jmolloy added a comment.
> 
> Hi Gerolf,
> 
> At a high level, could you please explain in what situations you expect *not* combining FMUL+FADD->FMA is a benefit? They use the same resource types on every chip I know of, and FMA is shorter in latency in every chip I know of than FMUL+FADD.
> 
> Cheers,
> 
> James
> 
> 
> ================
> Comment at: include/llvm/CodeGen/MachineCombinerPattern.h:42
> @@ +41,3 @@
> +  MULSUBXI_OP1,
> +  // Floating Point
> +  FMULADDS_OP1,
> ----------------
> For the future: the pattern list is starting to grow quite large. I wonder if in the future we should consider moving the MachineCombinerPatterns to be table-generated?
> 
> 
> http://reviews.llvm.org/D18751
> 
> 
>