[llvm-branch-commits] [llvm] [InlineSpiller][AMDGPU] Implement subreg reload during RA spill (PR #175002)

Christudasan Devadasan via llvm-branch-commits llvm-branch-commits at lists.llvm.org
Sat Jan 24 09:45:47 PST 2026


================
@@ -1248,18 +1249,62 @@ void InlineSpiller::spillAroundUses(Register Reg) {
 
     // Create a new virtual register for spill/fill.
     // FIXME: Infer regclass from instruction alone.
-    Register NewVReg = Edit->createFrom(Reg);
+
+    unsigned SubReg = 0;
+    LaneBitmask CoveringLanes = LaneBitmask::getNone();
+    // If the subreg liveness is enabled, identify the subreg use(s) to try
+    // subreg reload. Skip if the instruction also defines the register.
+    // For copy bundles, get the covering lane masks.
+    if (MRI.subRegLivenessEnabled() && !RI.Writes) {
+      for (auto [MI, OpIdx] : Ops) {
+        const MachineOperand &MO = MI->getOperand(OpIdx);
+        assert(MO.isReg() && MO.getReg() == Reg);
+        if (MO.isUse()) {
+          SubReg = MO.getSubReg();
+          if (SubReg)
+            CoveringLanes |= TRI.getSubRegIndexLaneMask(SubReg);
+        }
+      }
+    }
+
+    if (MI.isBundled() && CoveringLanes.any()) {
+      CoveringLanes = LaneBitmask(bit_ceil(CoveringLanes.getAsInteger()) - 1);
----------------
cdevadas wrote:

Admittedly, Some of the logics I used in this patch with Lanemask manipulations are somewhat hacky. This code here is needed to correctly handle copy bundles where the individual copies may target non-contiguous subregisters of a tuple. For instance, a bundle containing two copies: one covering sub0_sub1 and another covering sub3 of a 256-bit tuple. With the bit_ceil-based compaction, the resulting lane mask becomes contigous sub0_sub1_sub2_sub3 by filling-in sub2, which is a valid subreg index for this tuple, and the `getSubRegIdxFromLaneMask` helper I added returns the correct SubRegIdx. 
Originally, I tried to use `getCoveringSubRegIndexes`. However, I found it isn’t fully reliable for my use case. When the covering mask originally represents a contiguous lane-range (say sub0_sub1_sub2), the function fails to produce the correct index. The root cause is the check at this line https://github.com/llvm/llvm-project/blob/main/llvm/lib/CodeGen/TargetRegisterInfo.cpp#L558
Given that behavior, the compaction + lane-mask-to-subregIdx approach was an option that consistently returns the correct index for these irregular bundles. I also understand the how the lane mask is interpreted here. The representation is subtle, and I should find a better alternative to correctly gather the requierd info.

https://github.com/llvm/llvm-project/pull/175002


More information about the llvm-branch-commits mailing list