[PATCH] D122501: [AMDGPU] Enable PreRARematerialize scheduling pass with multiple high RP regions
Stanislav Mekhanoshin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 29 14:45:57 PDT 2022
rampitec added inline comments.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:658
+ collectRematerializableInstructions();
if (RematerializableInsts.empty() ||
+ !sinkTriviallyRematInsts(ST, TII))
----------------
Fix the formatting as tidy suggests.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:731
if (!SRI->isVGPRClass(MRI.getRegClass(Reg)) || !MRI.hasOneDef(Reg) ||
!MRI.hasOneUse(Reg))
continue;
----------------
Unrelated to this patch. Can you separately change this to hasOneNonDBGUse()?
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:739
MachineInstr *UseI = &*MRI.use_instr_begin(Reg);
if (Def->getParent() == UseI->getParent())
----------------
This also needs a nodbg use (use_nodbg_instructions).
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:753
+ RematerializableInsts[I][Def] = UseI;
+ RematDefToLiveInRegions[Def].push_back(I);
+ }
----------------
It has one use, so you can break from the loop here.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:777
+ // as we rematerialize.
+ NewPressure = Pressure;
+ NewLiveIns = LiveIns;
----------------
This is expensive to make a full copy of everything, even though you only need a small number of regions.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:783
DenseMap<MachineInstr *, MachineInstr *> InsertedMIToOldDef;
- GCNDownwardRPTracker RPT(*LIS);
- auto *NonDbgMI = &*skipDebugInstructionsForward(Regions[HighRPIdx].first,
- Regions[HighRPIdx].second);
- unsigned ImproveOccupancy = 0;
- for (auto &It : RematerializableInsts) {
- MachineInstr *Def = It.first;
- MachineBasicBlock::iterator InsertPos =
- MachineBasicBlock::iterator(It.second);
- Register Reg = Def->getOperand(0).getReg();
- // Rematerialize MI to its use block. Since we are only rematerializing
- // instructions that do not have any virtual reg uses, we do not need to
- // call LiveRangeEdit::allUsesAvailableAt() and
- // LiveRangeEdit::canRematerializeAt().
- NewLiveIns[Reg] = LaneBitmask::getNone();
- TII->reMaterialize(*InsertPos->getParent(), InsertPos, Reg,
- Def->getOperand(0).getSubReg(), *Def, *TRI);
- MachineInstr *NewMI = &*(--InsertPos);
- LIS->InsertMachineInstrInMaps(*NewMI);
- LIS->removeInterval(Reg);
- LIS->createAndComputeVirtRegInterval(Reg);
- InsertedMIToOldDef[NewMI] = Def;
-
- // FIXME: Need better way to update RP without re-iterating over region
- RPT.reset(*NonDbgMI, &NewLiveIns);
- RPT.advance(Regions[HighRPIdx].second);
- GCNRegPressure RPAfterSinking = RPT.moveMaxPressure();
- ImproveOccupancy = RPAfterSinking.getOccupancy(ST);
- if (ImproveOccupancy > MinOccupancy)
+ bool Improved = true;
+ for (unsigned I = 0, E = Regions.size(); I != E; ++I) {
----------------
It is better to have an inverted logic, initialize it to false and only set to true if actually improved anything.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:852
+ LaneBitmask PrevMask = NewLiveIns[I][Reg];
+ // Update RP for all tracked high RP regions that this has this MI as a
+ // live-in.
----------------
Has this *Reg* as a live-in.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:870
+ }
+
+ SinkedDefs.push_back(Def);
----------------
Don't you need to remove Reg from NewLiveIns at this point for all regions?
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:903
LIS->createAndComputeVirtRegInterval(Reg);
}
+ return Improved;
----------------
It is probably also possible to call updateRegionBoundaries to restore the state.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:904
}
- return false;
+ return Improved;
}
----------------
return false;
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:933
SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
MFI.increaseOccupancy(MF, ++MinOccupancy);
----------------
Actually ++MinOccupancy suggests 1 step improvement. It can be higher, but you do not really know as you would need to check all other blocks too.
================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:935
- return true;
+ return Improved;
}
----------------
return true;
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D122501/new/
https://reviews.llvm.org/D122501
More information about the llvm-commits
mailing list