[PATCH] D39976: [AArch64] Consider the cost model when folding loads and stores

Tue Feb 6 12:54:09 PST 2018

evandro added a comment.

In https://reviews.llvm.org/D39976#998081, @gberry wrote:

> I've thought about this some more and tested it out on Falkor.  As currently written this change causes SIMD store instructions to not have pre/post increments folded into them, causing minor performance regressions.

I see that they're modeled with a latency of 0 and 4 uops.  Are the units they need, ST and VSD, really used for 0 cycles?

> I have the following general reservations as well:
> 
> - does using the max latency of the load/store and add make sense given that the operations are dependent?

They're only dependent for the pre index addressing mode.  However, since the latency of the load and of the store is considerably larger even in this case, methinks that it's a sensible approximation.

> - does always favoring latency over number of uops (an approximation of throughput) make sense?  unless the operation is on the critical path I would think not.

In previous versions of this patch I tried to weigh both metrics, but found it difficult to come up with a satisfying heuristic.  Any ideas?

> This combined with the assumptions about multiple uop instructions (which also is not true for Falkor), I would suggest perhaps a better approach would be a add a target-specific property that would allow you to avoid the specific opcodes that are a problem for your target.

Perhaps the cost function could be target specific behind.  Thoughts?

Repository:
  rL LLVM

https://reviews.llvm.org/D39976