[PATCH] D25020: [ARM] Fix 26% performance regression on Cortex-A9 caused by not using VMLA/VMLS

Wed Oct 12 06:31:43 PDT 2016

rovka added inline comments.

================
Comment at: lib/Target/ARM/ARMISelDAGToDAG.cpp:443
+    break;
+  }
+
----------------
eastig wrote:
> eastig wrote:
> > rovka wrote:
> > > This is now checking only that the node can be lowered to VMLx. What happened to the part checking if forwarding can be used (i.e. mac following multiply or mac) and all the other checks?
> > I removed FMA checks because of the following reasons:
> > # FMA is lowered either VFMA or a library call if a target does not support VFPv4.
> > # I have not found any information about accumulator forwarding for VFMA.
> > 
> > I removed other checks because I could not write tests for them. Are there cases when they are false?
> Another point is that ARMDAGToDAGISel::hasNoVMLxHazardUse is only called instructions are combined into VMLx:
> 
> ARMInstrInfo.td:
> 
> ```
> // An 'fadd' node which checks for single non-hazardous use.
> def fadd_mlx : PatFrag<(ops node:$lhs, node:$rhs),(fadd node:$lhs, node:$rhs),[{
>   return hasNoVMLxHazardUse(N);
> }]>;
> 
> // An 'fsub' node which checks for single non-hazardous use.
> def fsub_mlx : PatFrag<(ops node:$lhs, node:$rhs),(fsub node:$lhs, node:$rhs),[{
>   return hasNoVMLxHazardUse(N);
> }]>;
> ```
Ok, the FMA thing sounds reasonable.
Some of the other checks are ok to remove, but from what I understand now you're only checking that you have a FADD/FSUB with a FMUL as its operand - this will be lowered to a VMLx, but in order to care about accumulator forwarding you need another VMUL/VMLA. The way it is written now, it will return true from hasNoVMLxHazardUse without actually looking at the uses, which may be bad if the use is some other NEON fp instruction (unless I'm missing something).

https://reviews.llvm.org/D25020