[PATCH] D25020: [ARM] Fix 26% performance regression on Cortex-A9 caused by not using VMLA/VMLS

Wed Oct 12 05:08:59 PDT 2016

eastig added a comment.

> What about the vector support?

Currently vector VMLx instructions are expanded by MLxExpansionPass. I am running the LNT testsuite to check if there is any performance gain when they are not expanded. I don't know if accumulator forwarding is used for vector VMLx. There is no such note for vector VMLx as for VFP VMLx.

================
Comment at: lib/Target/ARM/ARMISelDAGToDAG.cpp:443
+    break;
+  }
+
----------------
rovka wrote:
> This is now checking only that the node can be lowered to VMLx. What happened to the part checking if forwarding can be used (i.e. mac following multiply or mac) and all the other checks?
I removed FMA checks because of the following reasons:
# FMA is lowered either VFMA or a library call if a target does not support VFPv4.
# I have not found any information about accumulator forwarding for VFMA.

I removed other checks because I could not write tests for them. Are there cases when they are false?

https://reviews.llvm.org/D25020