[llvm] [AMDGPU] Allow rematerialization of instructions with virtual register uses (PR #124327)
Jeffrey Byrnes via llvm-commits
llvm-commits at lists.llvm.org
Fri Jan 31 16:21:07 PST 2025
================
@@ -1615,6 +1615,61 @@ void GCNSchedStage::revertScheduling() {
DAG.Regions[RegionIdx] = std::pair(DAG.RegionBegin, DAG.RegionEnd);
}
+bool PreRARematStage::allUsesAvailableAt(const MachineInstr *InstToRemat,
+ SlotIndex OriginalIdx,
+ SlotIndex RematIdx) const {
+
+ LiveIntervals *LIS = DAG.LIS;
+ MachineRegisterInfo &MRI = DAG.MRI;
+ OriginalIdx = OriginalIdx.getRegSlot(true);
+ RematIdx = std::max(RematIdx, RematIdx.getRegSlot(true));
+ for (const MachineOperand &MO : InstToRemat->operands()) {
+ if (!MO.isReg() || !MO.getReg() || !MO.readsReg())
+ continue;
+
+ // Do not attempt to reason about PhysRegs
+ if (!MO.getReg().isVirtual()) {
+ assert(DAG.MRI.isConstantPhysReg(MO.getReg()) ||
+ DAG.TII->isIgnorableUse(MO));
----------------
jrbyrnes wrote:
Actually, after second thought, I think https://godbolt.org/z/qWh47GdWG is not correct control flow.
The case we care about is when: 1. we have a single def, 2. there is a use in a block with a more permissive $exec mask. If we are to remat the def for that use, we will end up using bits which should have been masked out.
However, I don't think such a structure is produceable by control flow. The register will either have multiple defs for separate incoming blocks, or a phi node (in which case we won't be doing remat anyways). PSDB looks good so far, though I plan to do more thorough testing of this (relax conditions for rematerialization)
That said, we may need to disable rematerialization if the kernel has exec handling for WWM / WQM -- looking in to this
https://github.com/llvm/llvm-project/pull/124327
More information about the llvm-commits
mailing list