[llvm] [llvm-mca] Add command line option `-use-load-latency` (PR #94566)

Mon Jun 10 09:53:58 PDT 2024

================
@@ -230,6 +231,13 @@ static void computeMaxLatency(InstrDesc &ID, const MCInstrDesc &MCDesc,
   }
 
   int Latency = MCSchedModel::computeInstrLatency(STI, SCDesc);
----------------
michaelmaitland wrote:

IIUC, there are two ways we use `LoadLatency` in LLVM today:

1. In `TargetSchedModel::computeInstrLatency` under the condition `!hasInstrSchedModel && !SCDesc.isValid && MI.mayLoad()` by making a call to `defaultDefLatency`. It seems like `LoadLatency` is really used as a backup in case something is misconfigured.

2. Scheduler models will use it as a helper variable in computing instruction latencies. I am not 100% sure because I am not an expert on these models, but I think this is modeling the worse case scenario of checking the cache, having a miss, and having to do the load. Or it is possible that these models are misusing `LoadLatency` as a `BaseLatency` and adjusting the latency accordingly. For example:
```
llvm/lib/Target/X86/X86ScheduleZnver2.td:252:  let Latency = !add(Zn2WriteIMulH.Latency, Znver2Model.LoadLatency);
```

It seems like you are not trying to use it under the (1) or (2) interpretations of this variable. And here is why we shouldn't use it in these circumstances:
* In the first case, if we are using llvm-mca with `!hasInstrSchedModel`, then something is probably wrong and a warning or error should have occurred sooner. This would mean that the entire report is not meaningful. If we are using llvm-mca where `!SCDest.isValid`, then again there is a bigger issue here. Using LoadLatency probably isn't the solution. So if either of these conditions are true, it does not seem like the right solution to use `LoadLatency` in the llvm-mca reports. It makes sense to error or warn earlier.
* In the second case, the `LoadLatency` should be baked into `int Latency = MCSchedModel::computeInstrLatency(STI, SCDesc);`. I think this makes the assignment `Latency = std::max(int(SM.LoadLatency), Latency);` incorrect.

Prior to this patch, it does not seem like we are using `LoadLatency` to model the case you describe: as the cycles for loads to access the cache and assume that there is always a cache hit. It seems to be specified in such a way where it would make sense to use it in this way, so I think it is reasonable to have llvm-mca use it as such.

As @adibiagio points out, I don't think this is what the code added in this patch models. We would need to determine the depth of the load pipeline. I think the current interpretation of this patch is to use whatever latency is higher: latency assuming a cache hit but ignoring load pipeline depth or latency assigned to instruction. I think in most reasonable models, the second one should always be higher, so I'm not sure how useful this would be.

As @adibiagio pointed out load-to-use latency changes depending on whether the opcode type (int, fp, vector). I wonder if `LoadLatency` is too simple/naive of a value to be useful on processors that have multiple types of loads. One solution is to move `LoadLatency` field in `ProcWriteResources` that can be set per scheduler class. I would not be opposed if the solution proposed here came first and this improvement was added on as a follow up. This would allow models that do have a single LoadLatency to benefit from this kind of reporting.

As side note, I wonder if the scheduler could or should make more use of this interpretation in scheduling.

https://github.com/llvm/llvm-project/pull/94566