[llvm] [AMDGPU] Reschedule loads in clauses to improve throughput (PR #102595)
Carl Ritson via llvm-commits
llvm-commits at lists.llvm.org
Wed Feb 4 05:08:22 PST 2026
================
@@ -129,6 +134,141 @@ bool SIPostRABundler::canBundle(const MachineInstr &MI,
!isDependentLoad(NextMI));
}
+static Register getDef(MachineInstr &MI) {
+ assert(MI.getNumExplicitDefs() > 0);
+ return MI.defs().begin()->getReg();
+}
+
+void SIPostRABundler::reorderLoads(
+ MachineBasicBlock &MBB, MachineBasicBlock::instr_iterator &BundleStart,
+ MachineBasicBlock::instr_iterator Next) {
+ // Don't reorder ALU, store or scalar clauses.
+ if (!BundleStart->mayLoad() || BundleStart->mayStore() ||
+ SIInstrInfo::isSMRD(*BundleStart) || !BundleStart->getNumExplicitDefs())
+ return;
+
+ // Search to find the usage distance of each defined register in the clause.
+ const unsigned SearchDistance = std::max(Defs.size(), 100UL);
+ SmallDenseMap<Register, unsigned> UseDistance;
+ unsigned MaxDistance = 0;
+ for (MachineBasicBlock::iterator SearchI = Next;
+ SearchI != MBB.end() && MaxDistance < SearchDistance &&
+ UseDistance.size() < Defs.size();
+ ++SearchI, ++MaxDistance) {
+ for (Register Reg : Defs) {
+ if (UseDistance.contains(Reg))
+ continue;
+ if (SearchI->readsRegister(Reg, TRI))
+ UseDistance[Reg] = MaxDistance;
----------------
perlfu wrote:
`readsRegister` is expensive compared to testing the `SmallDenseMap`. The `contains` check avoids any use of `readsRegister` for already located definitions.
I profiled this and it is faster this way.
https://github.com/llvm/llvm-project/pull/102595
More information about the llvm-commits
mailing list