[PATCH] D103955: [MCA] Use LSU for the in-order pipeline

Wed Jun 9 07:06:16 PDT 2021

andreadb added a comment.

> This model is not very accurate though - Cortex-A55 hardware still
> shows quite different results in comparison with MCA.

Accurate static simulation of memory operations is very difficult to achieve in practice.
Predicting whether a load would effectively alias another store, or even predicting whether a memory access would 
hit a specific cache level is hard to do ahead of time. Sometimes, it is just not possible due to the lack of
information (which can only be obtained at runtime). So, the inability to accurately predict the latency and aliasing of 
memory accesses will always be a big source of inaccuracy in general.

In the case of llvm-mca, there are several limiting factors. Most of those fall under the following two categories:

1. The llvm scheduling model simply doesn't provide enough information for llvm-mca to simulate the memory subsystem.
2. MCInst is a (too) flat/simple representation, and it doesn't provide enough information about memory operations.

About 1.
There is no knowledge about which caches are available in hardware (i.e. memory cache hierarchy, store buffers, TLB caches, etc.).
Since there is no cache (at least, from the llvm-mca point of view), there is only one possible "latency" value for every write.
For loads, most models tend to encode an optimistic "load-to-use latency" in the write latency itself.
There is no way to use different latency values if the value is believed to miss the L1 <https://reviews.llvm.org/L1>. Most of the times, the
"optimistic load-to-use latency" assumes a HIT in the L1 <https://reviews.llvm.org/L1>.

We could introduce special annotations (like metadata, or llvm-mca comments) to describe the
"probability of hitting a different cache level". We could then use that knowledge in conjunction with a more accurate tablegen
description of the memory hierarchy.
This is just an idea: it might improve the simulation, at the cost of adding more complex abstractions. There may be already a PR for this.

More in general: lvm-mca doesn't know about memory types. It assumes that all memory is cacheable. The LSU rules work quite well for WB
(and even write through) memory. Non cacheable memory would be subject to different latencies, and stores might be
subject to so-called  "write combining". For simplicity, llvm-mca assumes that all stores are cacheable, so there is no 
attempt at modelling the WC logic in HW.

For in-order processors, not being able to model store buffers may still be fine.
After all (at least in theory) there is no reason why stores should be delayed. I expect stores to be immediately committed.
It also means that we don't need to worry about modelling things like STLF (store-to-load forwarding).
The lack of STLF prediction is one of the bigger sources of inaccuracy when simulating memory intensive kernels
on OoO processors.

About 2.
One big difference between MCInst and MachineInstr, is that MCInst doesn't carry any information about memory accesses.
MCInst was designed as a simpler intermediate for integrated assemblers and disassemblers. It was not meant to be used
to implement complex data-flow analysis. So its structure is pretty flat by design.

For MachineInstr, we have that MachineMemOperand instances can be used to infer aliasing properties on loads/stores etc.
We don't have those operands for MCInst, so - even wanting - we cannot implement a greedy symbolic alias analysis to infer
which loads may-alias which stores.

Depending on the value of flag --noalias, we either always assume "may-alias" or "no-alias".
The default (i.e. --noalias=true) is what is optimistically used by llvm-mca. It may be also the main reason why you see a lot
of errors in your measurements. Although, keep in mind that this just one of the (many) sources of
inaccuracy (as already described before in my point 1.).
Let say that --noalias is a good "default" for things like memcpy-like patterns.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103955/new/

https://reviews.llvm.org/D103955