[llvm] MTM: improve operand latency when missing sched info (PR #101389)

Thu Sep 5 10:17:39 PDT 2024

================
@@ -761,6 +762,64 @@ static void updatePhysDepsDownwards(const MachineInstr *UseMI,
   }
 }
 
+/// Estimates the number of cycles elapsed between DefMI and UseMI if they're
+/// non-null and in the same BasicBlock. Returns std::nullopt when UseMI is in a
+/// different MBB than DefMI.
+static std::optional<unsigned>
+estimateDefUseCycles(const TargetSchedModel &Sched, const MachineInstr *DefMI,
+                     const MachineInstr *UseMI) {
+  if (!DefMI || !UseMI || DefMI == UseMI)
+    return 0;
+  const MachineBasicBlock *ParentBB = DefMI->getParent();
+  if (ParentBB != UseMI->getParent())
+    return std::nullopt;
+
+  const auto DefIt =
+      llvm::find_if(ParentBB->instrs(),
+                    [DefMI](const MachineInstr &MI) { return DefMI == &MI; });
+  const auto UseIt =
+      llvm::find_if(ParentBB->instrs(),
+                    [UseMI](const MachineInstr &MI) { return UseMI == &MI; });
+
+  unsigned NumMicroOps = 0;
+  for (auto It = DefIt; It != UseIt; ++It) {
+    // In cases where the UseMI is a PHI at the beginning of the MBB, compute
+    // MicroOps until the end of the MBB.
+    if (It.isEnd())
+      break;
+
+    NumMicroOps += Sched.getNumMicroOps(&*It);
+  }
+  return NumMicroOps / Sched.getIssueWidth();
+}
+
+/// Wraps Sched.computeOperandLatency, accounting for the case when
+/// InstrSchedModel and InstrItineraries are not available: in this case,
+/// Sched.computeOperandLatency returns DefaultDefLatency, which is a very rough
+/// approximate; to improve this approximate, offset it by the approximate
+/// cycles elapsed from DefMI to UseMI (since the MIs could be re-ordered by the
+/// scheduler, and we don't have this information, this cannot be known
+/// exactly). When scheduling information is available,
+/// Sched.computeOperandLatency returns a much better estimate (especially if
+/// UseMI is non-null), so we just return that.
+static unsigned computeOperandLatency(const TargetSchedModel &Sched,
+                                      const MachineInstr *DefMI,
+                                      unsigned DefOperIdx,
+                                      const MachineInstr *UseMI,
+                                      unsigned UseOperIdx) {
+  assert(DefMI && "Non-null DefMI expected");
+  if (!Sched.hasInstrSchedModel() && !Sched.hasInstrItineraries()) {
+    unsigned DefaultDefLatency = Sched.getInstrInfo()->defaultDefLatency(
----------------
michaelmaitland wrote:

Take this example:

```
defmi = ...
load = ...
usemi = ...
```

I think with your patch you are estimating 2 cycles between `defmi` and `usemi` with issue width = 1 (1 cycle for defmi and one cycle for load). My question is whether this is a good estimation since loads typically have different latency than the normal case. I gave it some more thought though and if `usemi` doesn't use `load` result then this does not matter, and what you have looks good.

If it does use `load` though, then we'd be calling `computeOperandLatency` between `load` and `usemi`. In that case though, should we be taking `LoadLatency` into account? 

https://github.com/llvm/llvm-project/pull/101389