[PATCH] D44654: [X86][SandyBridge] SBWriteResPair +5cy and +1uop Memory Folds

Mon Mar 26 06:34:03 PDT 2018

courbet added a comment.

In https://reviews.llvm.org/D44654#1045323, @craig.topper wrote:

> 2 cycle latency for MOV64rm seems low to me. There's an address calculation and a TLB lookup before it can even start accessing the cache.
>
> Table 2-20 of https://software.intel.com/sites/default/files/managed/9e/bc/64-ia-32-architectures-optimization-manual.pdf shows the load latencies according to Intel.

The generated code had store-to-load forwarding, so the numbers here are indeed missing the memory access part. When we change the generator to prevent the forwarding, we see latencies of 11/9/7 for ymm0/xmm0/rax on sandybridge, which are consistent with the access-less numbers I mentioned above (4/3/2), plus access times according to the doc you pointed to (7/6/5): 11=7+4 / 9=6+3 / 7=2+5.
It's not obvious to me which number LLVM should be using (should we give it both numbers and teach it to recognize store-load forwarding opportunities and schedule accordingly ?)

We'll try to think of a way to integrate this in a principled way into llvm-exegesis (created PR36905).

Repository:
  rL LLVM

https://reviews.llvm.org/D44654