[llvm-branch-commits] [llvm] [AMDGPU] Make AMDGPURewriteAGPRCopyMFMA aware of subreg reload (PR #174998)
Matt Arsenault via llvm-branch-commits
llvm-branch-commits at lists.llvm.org
Thu Jan 8 08:15:43 PST 2026
================
@@ -422,6 +433,33 @@ bool AMDGPURewriteAGPRCopyMFMAImpl::tryFoldCopiesFromAGPR(
return MadeChange;
}
+unsigned
+AMDGPURewriteAGPRCopyMFMAImpl::getSubRegFromReload(MachineInstr &MI,
+ Register Reg) const {
+ unsigned NumRegs = TRI.getRegSizeInBits(*MRI.getRegClass(Reg)) / 32;
+ unsigned SubReg = 0;
+ // SubReg accesses for the tuple registers are of interest here.
+ // Note: We don't support 16-bit subreg reloads. If that assuption is
+ // changed in the future, this function should be revised.
+ if (NumRegs == 1)
+ return SubReg;
+
+ unsigned NumSpilledRegs = TRI.getNumSubRegsForSpillOp(MI);
+ // Skip if the entire tuple is reloaded.
+ if (NumRegs == NumSpilledRegs)
+ return SubReg;
+
+ // Construct the covering lanes for the reloaded portion.
+ unsigned SubRegIdx =
+ TII.getNamedOperand(MI, AMDGPU::OpName::offset)->getImm() / 4;
+ // Subreg lane masks are maintained in terms of regunits and each 32-bit
+ // register consists of two regunits.
+ uint64_t Lanes = (1ULL << NumSpilledRegs * 2) - 1;
+ LaneBitmask CoveringLanes = LaneBitmask(Lanes << SubRegIdx * 2);
----------------
arsenm wrote:
You shouldn't be making LaneBitmask layout assumptions like this. You are not supposed to directly use the value, only perform overlap checks
https://github.com/llvm/llvm-project/pull/174998
More information about the llvm-branch-commits
mailing list