[PATCH] D122501: [AMDGPU] Enable PreRARematerialize scheduling pass with multiple high RP regions

Tue Mar 29 14:45:57 PDT 2022

rampitec added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:658
+        collectRematerializableInstructions();
         if (RematerializableInsts.empty() ||
+            !sinkTriviallyRematInsts(ST, TII))
----------------
Fix the formatting as tidy suggests.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:731
     if (!SRI->isVGPRClass(MRI.getRegClass(Reg)) || !MRI.hasOneDef(Reg) ||
         !MRI.hasOneUse(Reg))
       continue;
----------------
Unrelated to this patch. Can you separately change this to hasOneNonDBGUse()?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:739

     MachineInstr *UseI = &*MRI.use_instr_begin(Reg);
     if (Def->getParent() == UseI->getParent())
----------------
This also needs a nodbg use (use_nodbg_instructions).

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:753
+        RematerializableInsts[I][Def] = UseI;
+        RematDefToLiveInRegions[Def].push_back(I);
+      }
----------------
It has one use, so you can break from the loop here.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:777
+  // as we rematerialize.
+  NewPressure = Pressure;
+  NewLiveIns = LiveIns;
----------------
This is expensive to make a full copy of everything, even though you only need a small number of regions.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:783
   DenseMap<MachineInstr *, MachineInstr *> InsertedMIToOldDef;
-  GCNDownwardRPTracker RPT(*LIS);
-  auto *NonDbgMI = &*skipDebugInstructionsForward(Regions[HighRPIdx].first,
-                                                  Regions[HighRPIdx].second);
-  unsigned ImproveOccupancy = 0;
-  for (auto &It : RematerializableInsts) {
-    MachineInstr *Def = It.first;
-    MachineBasicBlock::iterator InsertPos =
-        MachineBasicBlock::iterator(It.second);
-    Register Reg = Def->getOperand(0).getReg();
-    // Rematerialize MI to its use block. Since we are only rematerializing
-    // instructions that do not have any virtual reg uses, we do not need to
-    // call LiveRangeEdit::allUsesAvailableAt() and
-    // LiveRangeEdit::canRematerializeAt().
-    NewLiveIns[Reg] = LaneBitmask::getNone();
-    TII->reMaterialize(*InsertPos->getParent(), InsertPos, Reg,
-                       Def->getOperand(0).getSubReg(), *Def, *TRI);
-    MachineInstr *NewMI = &*(--InsertPos);
-    LIS->InsertMachineInstrInMaps(*NewMI);
-    LIS->removeInterval(Reg);
-    LIS->createAndComputeVirtRegInterval(Reg);
-    InsertedMIToOldDef[NewMI] = Def;
-
-    // FIXME: Need better way to update RP without re-iterating over region
-    RPT.reset(*NonDbgMI, &NewLiveIns);
-    RPT.advance(Regions[HighRPIdx].second);
-    GCNRegPressure RPAfterSinking = RPT.moveMaxPressure();
-    ImproveOccupancy = RPAfterSinking.getOccupancy(ST);
-    if (ImproveOccupancy > MinOccupancy)
+  bool Improved = true;
+  for (unsigned I = 0, E = Regions.size(); I != E; ++I) {
----------------
It is better to have an inverted logic, initialize it to false and only set to true if actually improved anything.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:852
+      LaneBitmask PrevMask = NewLiveIns[I][Reg];
+      // Update RP for all tracked high RP regions that this has this MI as a
+      // live-in.
----------------
Has this *Reg* as a live-in.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:870
+      }
+
+      SinkedDefs.push_back(Def);
----------------
Don't you need to remove Reg from NewLiveIns at this point for all regions?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:903
       LIS->createAndComputeVirtRegInterval(Reg);
     }
+    return Improved;
----------------
It is probably also possible to call updateRegionBoundaries to restore the state.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:904
     }
-    return false;
+    return Improved;
   }
----------------
return false;

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:933
   SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
   MFI.increaseOccupancy(MF, ++MinOccupancy);

----------------
Actually ++MinOccupancy suggests 1 step improvement. It can be higher, but you do not really know as you would need to check all other blocks too.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:935

-  return true;
+  return Improved;
 }
----------------
return true;

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122501/new/

https://reviews.llvm.org/D122501