[llvm] [AMDGPU] Allow rematerialization of instructions with virtual register uses (PR #124327)

Fri Jan 31 16:21:07 PST 2025

================
@@ -1615,6 +1615,61 @@ void GCNSchedStage::revertScheduling() {
   DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd);
 }
 
+bool PreRARematStage::allUsesAvailableAt(const MachineInstr *InstToRemat,
+                                         SlotIndex OriginalIdx,
+                                         SlotIndex RematIdx) const {
+
+  LiveIntervals *LIS = DAG.LIS;
+  MachineRegisterInfo &MRI = DAG.MRI;
+  OriginalIdx = OriginalIdx.getRegSlot(true);
+  RematIdx = std::max(RematIdx, RematIdx.getRegSlot(true));
+  for (const MachineOperand &MO : InstToRemat->operands()) {
+    if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
+      continue;
+
+    // Do not attempt to reason about PhysRegs
+    if (!MO.getReg().isVirtual()) {
+      assert(DAG.MRI.isConstantPhysReg(MO.getReg()) ||
+             DAG.TII->isIgnorableUse(MO));
----------------
jrbyrnes wrote:

Actually, after second thought, I think https://godbolt.org/z/qWh47GdWG is not correct control flow.

The case we care about is when: 1. we have a single def, 2. there is a use in a block with a more permissive $exec mask. If we are to remat the def for that use, we will end up using bits which should have been masked out.

However, I don't think such a structure is produceable by control flow. The register will either have multiple defs for separate incoming blocks, or a phi node (in which case we won't be doing remat anyways). PSDB looks good so far, though I plan to do more thorough testing of this (relax conditions for rematerialization)

That said, we may need to disable rematerialization if the kernel has exec handling for WWM / WQM -- looking in to this

https://github.com/llvm/llvm-project/pull/124327