[llvm] [AMDGPU][Scheduler] Refactor VGPR rematerialization during scheduling (PR #118722)

Thu Dec 5 07:31:20 PST 2024

================
@@ -419,29 +420,48 @@ class ClusteredLowOccStage : public GCNSchedStage {
       : GCNSchedStage(StageID, DAG) {}
 };
 
+/// Attempts to increase function occupancy with respect to VGPR usage by one by
+/// sinking trivially rematerializable instructions to their use. When the stage
+/// estimates increasing occupancy is possible, as few instructions as possible
+/// are rematerialized to reduce potential negative effects on function latency.
+///
+/// TODO: We should extend this to work on SGPRs and AGPRs as well.
 class PreRARematStage : public GCNSchedStage {
 private:
-  // Each region at MinOccupancy will have their own list of trivially
-  // rematerializable instructions we can remat to reduce RP. The list maps an
-  // instruction to the position we should remat before, usually the MI using
-  // the rematerializable instruction.
-  MapVector<unsigned, MapVector<MachineInstr *, MachineInstr *>>
-      RematerializableInsts;
-
-  // Map a trivially rematerializable def to a list of regions at MinOccupancy
-  // that has the defined reg as a live-in.
-  DenseMap<MachineInstr *, SmallVector<unsigned, 4>> RematDefToLiveInRegions;
-
-  // Collect all trivially rematerializable VGPR instructions with a single def
-  // and single use outside the defining block into RematerializableInsts.
-  void collectRematerializableInstructions();
-
+  /// A trivially rematerializable VGPR-defining instruction along with
+  /// pre-computed information to help update the scheduler's status when we
+  /// rematerialize it.
+  struct RematInstruction {
+    /// Trivially rematerializable instruction.
+    MachineInstr *RematMI;
+    /// Region containing the rematerializable instruction.
+    unsigned DefRegion;
+    /// Single use of the rematerializable instruction's defined register,
+    /// located in a different block.
+    MachineInstr *UseMI;
----------------
lucas-rami wrote:

AFAIU it is per region, and in the case of a rematerialization both the region where the instruction was rematerialized from and the one it is rematerialized to will be rescheduled after this scheduling stage takes place.

In the future we could improve rematerialization to work even if the def and use are in the same block but not the same region, which from what I understand can happen and make sense.

https://github.com/llvm/llvm-project/pull/118722