[llvm] [AArch64][SME] Make getRegAllocationHints stricter for multi-vector loads (PR #123081)

Sander de Smalen via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 29 06:46:28 PST 2025


================
@@ -1108,25 +1114,82 @@ bool AArch64RegisterInfo::getRegAllocationHints(
   // instructions over reducing the number of clobbered callee-save registers,
   // so we add the strided registers as a hint.
   unsigned RegID = MRI.getRegClass(VirtReg)->getID();
-  // Look through uses of the register for FORM_TRANSPOSED_REG_TUPLE.
-  if ((RegID == AArch64::ZPR2StridedOrContiguousRegClassID ||
-       RegID == AArch64::ZPR4StridedOrContiguousRegClassID) &&
-      any_of(MRI.use_nodbg_instructions(VirtReg), [](const MachineInstr &Use) {
-        return Use.getOpcode() ==
-                   AArch64::FORM_TRANSPOSED_REG_TUPLE_X2_PSEUDO ||
-               Use.getOpcode() == AArch64::FORM_TRANSPOSED_REG_TUPLE_X4_PSEUDO;
-      })) {
-    const TargetRegisterClass *StridedRC =
-        RegID == AArch64::ZPR2StridedOrContiguousRegClassID
-            ? &AArch64::ZPR2StridedRegClass
-            : &AArch64::ZPR4StridedRegClass;
-
-    for (MCPhysReg Reg : Order)
-      if (StridedRC->contains(Reg))
-        Hints.push_back(Reg);
+  if (RegID == AArch64::ZPR2StridedOrContiguousRegClassID ||
+      RegID == AArch64::ZPR4StridedOrContiguousRegClassID) {
+
+    // Look through uses of the register for FORM_TRANSPOSED_REG_TUPLE.
+    for (const MachineInstr &Use : MRI.use_nodbg_instructions(VirtReg)) {
+      if (Use.getOpcode() != AArch64::FORM_TRANSPOSED_REG_TUPLE_X2_PSEUDO &&
+          Use.getOpcode() != AArch64::FORM_TRANSPOSED_REG_TUPLE_X4_PSEUDO)
+        continue;
+
+      unsigned LdOps = Use.getNumOperands() - 1;
----------------
sdesmalen-arm wrote:

I just realised that `LdOps` is a misnomer, because it is assigning the number of operands in the use (form_reg_tuple). If the use here has 4 operands, it could still be that the load has 2, e.g.
```
ld1 { z0, z8 }, p0/z, [...]
ld1 { z1, z9 }, p0/z, [...]
ld1 { z2, z10 }, p0/z, [...]
ld1 { z3, z11 }, p0/z, [...]
{z0, z1, z2, z3} = form_reg_tuple {z0, z8}:0, {z1, z9}:0, {z2, z10}:0, {z3, z11}:0
```

The uses below assume this is about the number of operands of the Use, so it seems like it's just the name that's wrong.

The same is not true for `StridedRC`, which uses the wrong register class (it should decided the strided RC based on `RegID` instead)

https://github.com/llvm/llvm-project/pull/123081


More information about the llvm-commits mailing list