[llvm] [AMDGPU] siloadstoreopt generate REG_SEQUENCE with aligned operands (PR #162088)
Janek van Oirschot via llvm-commits
llvm-commits at lists.llvm.org
Thu Oct 9 08:56:42 PDT 2025
================
@@ -1411,10 +1411,33 @@ SILoadStoreOptimizer::copyFromSrcRegs(CombineInfo &CI, CombineInfo &Paired,
const auto *Src0 = TII->getNamedOperand(*CI.I, OpName);
const auto *Src1 = TII->getNamedOperand(*Paired.I, OpName);
+ // Make sure the generated REG_SEQUENCE has sensibly aligned registers.
+ const TargetRegisterClass *CompatRC0 =
+ TRI->getSubRegisterClass(SuperRC, SubRegIdx0);
+ const TargetRegisterClass *CompatRC1 =
+ TRI->getSubRegisterClass(SuperRC, SubRegIdx1);
+ assert(CompatRC0 && CompatRC1 &&
+ "Cannot find compatible TargetRegisterClass");
+
+ Register Src0Reg = CompatRC0 == CI.DataRC
+ ? Src0->getReg()
+ : MRI->createVirtualRegister(CompatRC0);
+ Register Src1Reg = CompatRC1 == Paired.DataRC
+ ? Src1->getReg()
+ : MRI->createVirtualRegister(CompatRC1);
+
+ if (CompatRC0 != CI.DataRC)
+ BuildMI(*MBB, InsertBefore, DL, TII->get(TargetOpcode::COPY), Src0Reg)
+ .add(*Src0);
----------------
JanekvO wrote:
> this is checking against the wrong register class?
Yeah, needs a better compatibility check
> DataRC should already be a subregister class of SuperRC
I assumed so but might not be the case if the operand register has subreg idx
> try constrainRegClass before falling back to copy
>From trying it out, it seems to always use the aligned variant (i.e., `constrainRegClass(%1:vreg_64_align2, VReg_64)` will always continue using `vreg_64_align2` instead of forcing `VReg_64`) so may require COPY. Might just try to construct the REG_SEQUENCE in vgpr_32 parts only, is probably easier as well for further optimizations afterwards. (do let me know if I'm missing low hanging fruit on this end, though)
https://github.com/llvm/llvm-project/pull/162088
More information about the llvm-commits
mailing list