[llvm] [AArch64][SVE] Avoid movprfx by reusing register for _UNDEF pseudos. (PR #166926)
Ricardo Jesus via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 10 10:38:27 PST 2025
================
@@ -1123,24 +1123,83 @@ unsigned AArch64RegisterInfo::getRegPressureLimit(const TargetRegisterClass *RC,
}
}
-// FORM_TRANSPOSED_REG_TUPLE nodes are created to improve register allocation
-// where a consecutive multi-vector tuple is constructed from the same indices
-// of multiple strided loads. This may still result in unnecessary copies
-// between the loads and the tuple. Here we try to return a hint to assign the
-// contiguous ZPRMulReg starting at the same register as the first operand of
-// the pseudo, which should be a subregister of the first strided load.
+// We add regalloc hints for different cases:
+// * Choosing a better destination operand for predicated SVE instructions
+// where the inactive lanes are undef, by choosing a register that is not
+// unique to the other operands of the instruction.
//
-// For example, if the first strided load has been assigned $z16_z20_z24_z28
-// and the operands of the pseudo are each accessing subregister zsub2, we
-// should look through through Order to find a contiguous register which
-// begins with $z24 (i.e. $z24_z25_z26_z27).
+// * Improve register allocation for SME multi-vector instructions where we can
+// benefit from the strided- and contiguous register multi-vector tuples.
//
+// Here FORM_TRANSPOSED_REG_TUPLE nodes are created to improve register
+// allocation where a consecutive multi-vector tuple is constructed from the
+// same indices of multiple strided loads. This may still result in
+// unnecessary copies between the loads and the tuple. Here we try to return a
+// hint to assign the contiguous ZPRMulReg starting at the same register as
+// the first operand of the pseudo, which should be a subregister of the first
+// strided load.
+//
+// For example, if the first strided load has been assigned $z16_z20_z24_z28
+// and the operands of the pseudo are each accessing subregister zsub2, we
+// should look through through Order to find a contiguous register which
+// begins with $z24 (i.e. $z24_z25_z26_z27).
bool AArch64RegisterInfo::getRegAllocationHints(
Register VirtReg, ArrayRef<MCPhysReg> Order,
SmallVectorImpl<MCPhysReg> &Hints, const MachineFunction &MF,
const VirtRegMap *VRM, const LiveRegMatrix *Matrix) const {
-
auto &ST = MF.getSubtarget<AArch64Subtarget>();
+ const AArch64InstrInfo *TII =
+ MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
+ const MachineRegisterInfo &MRI = MF.getRegInfo();
+
+ // For predicated SVE instructions where the inactive lanes are undef,
+ // pick a destination register that is not unique to avoid introducing
+ // a movprfx.
+ const TargetRegisterClass *RegRC = MRI.getRegClass(VirtReg);
+ if (AArch64::ZPRRegClass.hasSubClassEq(RegRC)) {
+ for (const MachineOperand &DefOp : MRI.def_operands(VirtReg)) {
+ const MachineInstr &Def = *DefOp.getParent();
+ if (DefOp.isImplicit() ||
+ (TII->get(Def.getOpcode()).TSFlags & AArch64::FalseLanesMask) !=
+ AArch64::FalseLanesUndef)
+ continue;
+
+ for (MCPhysReg R : Order) {
+ auto AddHintIfSuitable = [&](MCPhysReg R, const MachineOperand &MO) {
+ // R is a suitable register hint if there exists an operand for the
+ // instruction that is not yet allocated a register or if R matches
+ // one of the other source operands.
+ if (!VRM->hasPhys(MO.getReg()) || VRM->getPhys(MO.getReg()) == R)
+ Hints.push_back(R);
+ };
+
+ unsigned Opcode = AArch64::getSVEPseudoMap(Def.getOpcode());
+ switch (TII->get(Opcode).TSFlags & AArch64::DestructiveInstTypeMask) {
+ default:
+ break;
+ case AArch64::DestructiveTernaryCommWithRev:
+ AddHintIfSuitable(R, Def.getOperand(2));
+ AddHintIfSuitable(R, Def.getOperand(3));
+ AddHintIfSuitable(R, Def.getOperand(4));
+ break;
+ case AArch64::DestructiveBinaryComm:
+ case AArch64::DestructiveBinaryCommWithRev:
+ AddHintIfSuitable(R, Def.getOperand(2));
+ AddHintIfSuitable(R, Def.getOperand(3));
+ break;
+ case AArch64::DestructiveBinary:
+ case AArch64::DestructiveBinaryImm:
+ AddHintIfSuitable(R, Def.getOperand(2));
+ break;
+ }
+ }
+ }
+
+ if (Hints.size())
+ return TargetRegisterInfo::getRegAllocationHints(VirtReg, Order, Hints,
+ MF, VRM);
----------------
rj-jesus wrote:
I believe you're right, which is why I expected copy hints to come first. A missed copy hint is likely to lead to a MOV down the line, whereas a missed MOVPRFX hint should only lead to the MOVPRFX itself (which should be cheaper). That would happen in the example below if MachineCP weren't able to rewrite `$z0` with `$z4`.
For what it's worth, the patch does seem to increase the list of hints of affected pseudos considerably, including adding repeated ones ([example](https://godbolt.org/z/3vPEPjK6o)):
```
selectOrSplit ZPR:%4 [80r,96r:0) 0 at 80r weight:INF
hints: $z0 $z0 $z0 $z1 $z1 $z1 $z2 $z2 $z2 $z3 $z3 $z3 $z4 $z4 $z4 $z5 $z5 $z5 $z6 $z6 $z6 $z7 $z7 $z7 $z16 $z16 $z16 $z17 $z17 $z17 $z18 $z18 $z18 $z19 $z19 $z19 $z20 $z20 $z20 $z21 $z21 $z21 $z22 $z22 $z22 $z23 $z23 $z23 $z24 $z24 $z24 $z25 $z25 $z25 $z26 $z26 $z26 $z27 $z27 $z27 $z28 $z28 $z28 $z29 $z29 $z29 $z30 $z30 $z30 $z31 $z31 $z31 $z8 $z8 $z8 $z9 $z9 $z9 $z10 $z10 $z10 $z11 $z11 $z11 $z12 $z12 $z12 $z13 $z13 $z13 $z14 $z14 $z14 $z15 $z15 $z15 $z4
assigning %4 to $z0: B0 [80r,96r:0) 0 at 80r B0_HI [80r,96r:0) 0 at 80r H0_HI [80r,96r:0) 0 at 80r S0_HI [80r,96r:0) 0 at 80r D0_HI [80r,96r:0) 0 at 80r Q0_HI [80r,96r:0) 0 at 80r
```
Before the patch:
```
selectOrSplit ZPR:%4 [80r,96r:0) 0 at 80r weight:INF
hints: $z4
assigning %4 to $z4: B4 [80r,96r:0) 0 at 80r B4_HI [80r,96r:0) 0 at 80r H4_HI [80r,96r:0) 0 at 80r S4_HI [80r,96r:0) 0 at 80r D4_HI [80r,96r:0) 0 at 80r Q4_HI [80r,96r:0) 0 at 80r
```
I'm not sure how this affects the register allocator (or compile time), but since it has already been merged, I suppose we can keep an eye out for any issues. :)
https://github.com/llvm/llvm-project/pull/166926
More information about the llvm-commits
mailing list