[llvm] [AMDGPU] Reschedule loads in clauses to improve throughput (PR #102595)

Carl Ritson via llvm-commits llvm-commits at lists.llvm.org
Wed Feb 4 05:08:22 PST 2026


================
@@ -129,6 +134,141 @@ bool SIPostRABundler::canBundle(const MachineInstr &MI,
           !isDependentLoad(NextMI));
 }
 
+static Register getDef(MachineInstr &MI) {
+  assert(MI.getNumExplicitDefs() > 0);
+  return MI.defs().begin()->getReg();
+}
+
+void SIPostRABundler::reorderLoads(
+    MachineBasicBlock &MBB, MachineBasicBlock::instr_iterator &BundleStart,
+    MachineBasicBlock::instr_iterator Next) {
+  // Don't reorder ALU, store or scalar clauses.
+  if (!BundleStart->mayLoad() || BundleStart->mayStore() ||
+      SIInstrInfo::isSMRD(*BundleStart) || !BundleStart->getNumExplicitDefs())
+    return;
+
+  // Search to find the usage distance of each defined register in the clause.
+  const unsigned SearchDistance = std::max(Defs.size(), 100UL);
+  SmallDenseMap<Register, unsigned> UseDistance;
+  unsigned MaxDistance = 0;
+  for (MachineBasicBlock::iterator SearchI = Next;
+       SearchI != MBB.end() && MaxDistance < SearchDistance &&
+       UseDistance.size() < Defs.size();
+       ++SearchI, ++MaxDistance) {
+    for (Register Reg : Defs) {
+      if (UseDistance.contains(Reg))
+        continue;
+      if (SearchI->readsRegister(Reg, TRI))
+        UseDistance[Reg] = MaxDistance;
----------------
perlfu wrote:

`readsRegister` is expensive compared to testing the `SmallDenseMap`.  The `contains` check avoids any use of `readsRegister` for already located definitions.
I profiled this and it is faster this way.

https://github.com/llvm/llvm-project/pull/102595


More information about the llvm-commits mailing list