[PATCH] D124246: [AMDGPU] Adjust wave priority based on VMEM instructions to avoid duty-cycling.

Fri Apr 22 09:12:47 PDT 2022

kosarev added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:31
+static cl::opt<bool>
+    DisableSetWavePriority("amdgpu-disable-set-wave-priority",
+                           cl::desc("Disable adjusting wave priority"),
----------------
tsymalla wrote:
> foad wrote:
> > It's a bit more normal to put "Enable" options in AMDGPUTargetMachine, where you can use isPassEnabled.
> Probably invert that options so that setting it to true enables the pass
Changed to use `isPassEnabled()`, thanks.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:42
+
+class MBBInfoSet {
+public:
----------------
foad wrote:
> This seems like overkill. Can you just use a `DenseMap<const MachineBasicBlock *, bool>` directly? Or maybe just a `DenseSet<const MachineBasicBlock *>`?
On the other hand, if we know we will need more fields there to count VALUs, then it's probably not worth to rewrite it back and forth? (I don't mind either way.)

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:97
+
+MachineInstr *AMDGPUSetWavePriority::BuildSetprioMI(MachineFunction &MF,
+                                                    unsigned priority) const {
----------------
foad wrote:
> I think it would be neater to have this function take a MachineBasicBlock::iterator instead of MF, and have it automatically insert the created instruction (by calling one of the BuildMI overloads that does the insertion).
The inserting `BuildMI()` seem take both the MF/MBB and the iterator. Then, the aim was this function to create the instruction and leave the client to choose the way it wants to insert it, so we don't need to rewrite the function if we need to `insertAfter()` or something (which was already the case with some of the early versions of the patch).

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:102-103
+
+bool AMDGPUSetWavePriority::CanLowerPriorityDirectlyInPredecessors(
+    const MachineBasicBlock &MBB, MBBInfoSet &MBBInfos) const {
+  for (const MachineBasicBlock *Pred : MBB.predecessors()) {
----------------
foad wrote:
> I think this deserves a comment along the lines of: Check that for every predecessor Pred that can reach a VMEM load, none of Pred's successors can reach a VMEM load.
Done.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:107-108
+      continue;
+    if (Loops->getLoopFor(Pred))
+      return false;
+    for (const MachineBasicBlock *Succ : Pred->successors()) {
----------------
foad wrote:
> I don't understand what this is for. If Pred can reach a VMEM load and it is in a loop, then it must surely have at least one successor that can also reach a VMEM load (namely a successor that is on a path around the loop that leads back to Pred), so we will return false anyway on line 111 below.
Indeed. Removed. Thanks.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:122
+bool AMDGPUSetWavePriority::runOnMachineFunction(MachineFunction &MF) {
+  const unsigned HighPriority = 3;
+  const unsigned LowPriority = 0;
----------------
foad wrote:
> Should bail out early if skipFunction(MF.getFunction()), to help with bisecting optimization problems.
Done.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:130
+
+  if (DisableSetWavePriority)
+    return false;
----------------
tsymalla wrote:
> Can be moved to the beginning of the function.
Moved out to `AMDGPUTargetMachine.cpp`.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:136
+  for (MachineBasicBlock &MBB : MF) {
+    for (const MachineInstr &MI : MBB) {
+      if (isVMEMLoad(MI)) {
----------------
arsenm wrote:
> foad wrote:
> > Could use:
> > ```
> >   if (any_of(MBB, [&](const MachineInstr &MI){ return isVMEMLoad(MI); }))
> >     Worklist.push_back(&MBB);
> > ```
> > instead of an explicit loop, here and elsewhere - although it is a matter of taste.
> Is it really worthwhile to do this for all blocks? Should there be some kind of memory instruction count threshold?
Sure, done.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:159
+  MachineBasicBlock::iterator I = Entry.begin(), E = Entry.end();
+  while (I != E && !SIInstrInfo::isVALU(*I) && !I->isTerminator())
+    ++I;
----------------
tsymalla wrote:
> Could you comment that section please?
Done.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:163
+
+  // Lower the priorty on edges where control leaves blocks from which
+  // VMEM loads are reachable.
----------------
tsymalla wrote:
> Typo: priority
Fixed, thanks.

================
Comment at: llvm/lib/Target/AMDGPU/AMDGPUSetWavePriority.cpp:191
+  for (MachineBasicBlock *MBB : PriorityLoweringBlocks) {
+    MachineBasicBlock::iterator I = MBB->end(), B = MBB->begin();
+    while (I != B) {
----------------
arsenm wrote:
> tsymalla wrote:
> > Could be using the reverse iterator
> Probably want llvm::make_early_inc_range(llvm::reverse(*MBB))
Yeah, I tried that, but `MBB->rend()->getIterator()` seems to unable to return a valid iterator and crashes. I'm also not quite sure that would simplify the logic.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124246/new/

https://reviews.llvm.org/D124246