[llvm] [AMDGPU] siloadstoreopt generate REG_SEQUENCE with aligned operands (PR #162088)

Janek van Oirschot via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 10 06:59:12 PDT 2025


================
@@ -1411,10 +1411,33 @@ SILoadStoreOptimizer::copyFromSrcRegs(CombineInfo &CI, CombineInfo &Paired,
   const auto *Src0 = TII->getNamedOperand(*CI.I, OpName);
   const auto *Src1 = TII->getNamedOperand(*Paired.I, OpName);
 
+  // Make sure the generated REG_SEQUENCE has sensibly aligned registers.
+  const TargetRegisterClass *CompatRC0 =
+      TRI->getSubRegisterClass(SuperRC, SubRegIdx0);
+  const TargetRegisterClass *CompatRC1 =
+      TRI->getSubRegisterClass(SuperRC, SubRegIdx1);
+  assert(CompatRC0 && CompatRC1 &&
+         "Cannot find compatible TargetRegisterClass");
+
+  Register Src0Reg = CompatRC0 == CI.DataRC
+                         ? Src0->getReg()
+                         : MRI->createVirtualRegister(CompatRC0);
+  Register Src1Reg = CompatRC1 == Paired.DataRC
+                         ? Src1->getReg()
+                         : MRI->createVirtualRegister(CompatRC1);
+
+  if (CompatRC0 != CI.DataRC)
+    BuildMI(*MBB, InsertBefore, DL, TII->get(TargetOpcode::COPY), Src0Reg)
+        .add(*Src0);
----------------
JanekvO wrote:

> Actually for this I think you should always be able to constrain, no copies required

Am I do to something direct like `setRegClass` for this? Because trying to constrain an align2  to non-align2 equivalent (e.g., `constrainRegClass(%1:vreg_64_align2, VReg_64)`) will always fallback on the align2 variant which is exactly what I don't want with the generation of REG_SEQUENCE operands here.

https://github.com/llvm/llvm-project/pull/162088


More information about the llvm-commits mailing list