[PATCH] D119475: [AMDGPU] Add scheduler pass to rematerialize trivial defs

Mon Feb 21 17:51:30 PST 2022

kerbowa added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:640
+      if (Stage == PreRARematerialize) {
+        if (RegionsWithMinOcc.count() != 1 || Regions.size() == 1)
+          break;
----------------
If we only try this with a single region can RegionsWithMinOcc be an Optional<int> or something? Seems like it would not be necessary to have a vector tracking all the regions that fit the criteria.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:646
+        // Check maximum occupancy
+        if (ST.computeOccupancy(MF.getFunction(), MFI.getLDSSize()) ==
+            MinOccupancy)
----------------
Is this needed? If we are limited by the function max occupancy including LDS and wave-per-eu constraints then multiple regions will be at the MinOccupancy.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:652
+        // Pressure for regions inbetween the defs and region we sinked the def
+        // to. Will need to be fixed if there is another pass after this pass.
+        static_assert(LastStage == PreRARematerialize,
----------------
Why is the PreRARematerialize pass different and doesn't need this info? Do we only reschedule the single region?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:736
+
+    if (HighRPLiveIns.find(Reg) == HighRPLiveIns.end())
+      continue;
----------------
Can you add a comment as Stas suggests?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:783
+  SmallVector<MachineInstr *, 4> InsertedMI;
+  DenseMap<MachineInstr *, MachineInstr *> InsertedMIToOldDef;
+
----------------
Can these be combined to avoid the extra lookup?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:795
+    // Rematerialize MI to its use block. Since we are only rematerializing
+    // instructions that does not have any virtual reg uses, we do not need
+    // to call LiveRangeEdit::allUsesAvailableAt() and
----------------
TYPO: does->do

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:817
+
+  if (ImproveOccupancy <= MinOccupancy) {
+    // Occupancy is not improved. Undo sinking for the region
----------------
Is there a test for this case?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:878
+    MachineBasicBlock::iterator InsertPos =
+        std::next(MachineBasicBlock::iterator(MI));
+    bool UseBlockFound = false;
----------------
I guess you could have a function to do this updating of RegionBegin/RegionEnd since it is repeated?

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:81
+    PreRARematerialize,
+    LastStage = PreRARematerialize
   };
----------------
I feel like we need to refactor the way we implement scheduler passes now that there are so many of them and the logic is getting a bit complex. It's a separate thing though.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:123

+  // List of trivially rematerializable instructions we can remat to reduce RP
+  SmallVector<std::pair<MachineInstr *, MachineInstr *>> RematerializableInsts;
----------------
Can you update the comment so that it is clear these are pairs of defs/uses.

================
Comment at: llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:138
+
+  // Attempt to reduce RP of VGPR by sinking trivially rematerializable
+  // instructions. Returns true if we were able to sink instruction(s).
----------------
Why only VGPRs? Could this be extended to handle SGPRs and AGPRs.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D119475/new/

https://reviews.llvm.org/D119475