[llvm] [llvm-mca] Add command line option `-use-load-latency` (PR #94566)

Mon Jun 10 11:53:46 PDT 2024

================
@@ -230,6 +231,13 @@ static void computeMaxLatency(InstrDesc &ID, const MCInstrDesc &MCDesc,
   }
 
   int Latency = MCSchedModel::computeInstrLatency(STI, SCDesc);
----------------
adibiagio wrote:

> 2. Scheduler models will use it as a helper variable in computing instruction latencies. I am not 100% sure because I am not an expert on these models, but I think this is modeling the worse case scenario of checking the cache, having a miss, and having to do the load. Or it is possible that these models are misusing `LoadLatency` as a `BaseLatency` and adjusting the latency accordingly. For example:
> 
> ```
> llvm/lib/Target/X86/X86ScheduleZnver2.td:252:  let Latency = !add(Zn2WriteIMulH.Latency, Znver2Model.LoadLatency);
> ```

X86 scheduling models (optimistically) define `LoadLatency` as the L1D load-to-use latency.

Things are a bit more complicated for AMD processors, where load-to-use latency varies depending on whether value are used by the FPU or INT.

According to the official AMD SoG for family 17h: "The L1 data cache has a 4- or 5-cycle integer load-to-use latency, and a 7- or 8-cycle FPU load-to-use latency". That is the reason why the Znver2Model optimistically defines it as 4. 

If you look at the scheduling model for ZnVer2, you can see two multiclasses:
- multiclass Zn2WriteResPair is used by integer instructions with a folded load operand
- multiclass multiclass Zn2WriteResFpuPair is used by floating point/vector instructions with a folded load.

multiclass Zn2WriteResPair sets the default value for param `loadLat` to 4 (i.e. optimistic L1 data load-to-use INT latency according to the AMD docs).
On the other hand, multiclass Zn2WriteResFpuPair sets that param to 7 (cycles).

As far as I am aware of, Intel processors don't need to model multiple values for the load-to-use latency. On X86, only for AMD processors the load latency changes depending on whether the user is INT or FPU.

https://github.com/llvm/llvm-project/pull/94566