[PATCH] D156358: [AArch64] Do not unnecessarily spill LR because of @llvm.returnaddress
Anatoly Trosinenko via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri Jul 28 12:43:16 PDT 2023
atrosinenko added a comment.
After debugging a bit more, I suspect that the actual reason for emitting `XPACI LR` is not the instruction selection phase but register allocation.
Here is a simplified `returnaddress.ll` test case:
define ptr @ra0() nounwind readnone {
entry:
%0 = tail call ptr @llvm.returnaddress(i32 0)
ret ptr %0
}
declare ptr @llvm.returnaddress(i32) nounwind readnone
On a rather recent mainline LLVM commit, I run the following:
./bin/llc < returnaddress.ll -mtriple=arm64-eabi -mattr=v8.3a -print-after-all -debug-only=regalloc
Transition from LLVM IR to MIR looks good:
*** IR Dump After Module Verifier (verify) ***
; Function Attrs: nounwind memory(none)
define ptr @ra0() #0 {
entry:
%0 = tail call ptr @llvm.returnaddress(i32 0)
ret ptr %0
}
# *** IR Dump After AArch64 Instruction Selection (aarch64-isel) ***:
# Machine code for function ra0: IsSSA, TracksLiveness
Function Live Ins: $lr in %0
bb.0.entry:
liveins: $lr
%0:gpr64 = COPY $lr
%1:gpr64 = XPACI %0:gpr64(tied-def 0)
$x0 = COPY %1:gpr64
RET_ReallyLR implicit $x0
# End machine code for function ra0.
The operand of XPACI is a virtual register. Then, at register allocation time, the following is printed:
# *** IR Dump After Live Register Matrix (liveregmatrix) ***:
# Machine code for function ra0: NoPHIs, TracksLiveness, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $lr in %0
0B bb.0.entry:
liveins: $lr
16B %1:gpr64 = COPY $lr
48B %1:gpr64 = XPACI %1:gpr64(tied-def 0)
64B $x0 = COPY %1:gpr64
80B RET_ReallyLR implicit killed $x0
# End machine code for function ra0.
********** GREEDY REGISTER ALLOCATION **********
********** Function: ra0
********** INTERVALS **********
W30 [0B,16r:0) 0 at 0B-phi
%1 [16r,48r:0)[48r,64r:1) 0 at 16r 1 at 48r weight:INF
RegMasks:
********** MACHINEINSTRS **********
# Machine code for function ra0: NoPHIs, TracksLiveness, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $lr in %0
0B bb.0.entry:
liveins: $lr
16B %1:gpr64 = COPY $lr
48B %1:gpr64 = XPACI %1:gpr64(tied-def 0)
64B $x0 = COPY %1:gpr64
80B RET_ReallyLR implicit killed $x0
# End machine code for function ra0.
Enqueuing %1
AllocationOrder(GPR64) = [ $x8 $x9 $x10 $x11 $x12 $x13 $x14 $x15 $x16 $x17 $x18 $x0 $x1 $x2 $x3 $x4 $x5 $x6 $x7 $x19 $x20 $x21 $x22 $x23 $x24 $x25 $x26 $x27 $x28 $fp $lr ]
selectOrSplit GPR64:%1 [16r,48r:0)[48r,64r:1) 0 at 16r 1 at 48r weight:INF w=INF
hints: $lr $x0
assigning %1 to $lr: W30 [16r,48r:0)[48r,64r:1) 0 at 16r 1 at 48r
# *** IR Dump After Greedy Register Allocator (greedy) ***:
# Machine code for function ra0: NoPHIs, TracksLiveness, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $lr in %0
0B bb.0.entry:
liveins: $lr
16B %1:gpr64 = COPY $lr
48B %1:gpr64 = XPACI %1:gpr64(tied-def 0)
64B $x0 = COPY %1:gpr64
80B RET_ReallyLR implicit killed $x0
# End machine code for function ra0.
********** REWRITE VIRTUAL REGISTERS **********
********** Function: ra0
********** REGISTER MAP **********
[%1 -> $lr] GPR64
0B bb.0.entry:
liveins: $lr
16B %1:gpr64 = COPY $lr
48B %1:gpr64 = XPACI killed %1:gpr64(tied-def 0)
64B $x0 = COPY killed %1:gpr64
80B RET_ReallyLR implicit killed $x0
> renamable $lr = COPY $lr
Identity copy: renamable $lr = COPY $lr
deleted.
> renamable $lr = XPACI killed renamable $lr(tied-def 0)
> $x0 = COPY killed renamable $lr
> RET_ReallyLR implicit killed $x0
# *** IR Dump After Virtual Register Rewriter (virtregrewriter) ***:
# Machine code for function ra0: NoPHIs, TracksLiveness, NoVRegs, TiedOpsRewritten, TracksDebugUserValues
Function Live Ins: $lr
0B bb.0.entry:
liveins: $lr
48B renamable $lr = XPACI killed renamable $lr(tied-def 0)
64B $x0 = COPY killed renamable $lr
80B RET_ReallyLR implicit killed $x0
# End machine code for function ra0.
Note the `hints: $lr $x0` line. I believe that allocating `$lr` as `%1` is not expected because it is clobbered by `XPACI` but then needed to return from the function. As far as I understand, the reason is that `RET_ReallyLR` instruction hides its usage of `$lr`, so it is considered `killed` in the above line.
I tried overriding `AArch64RegisterInfo::getRegAllocationHints` method like this
bool AArch64RegisterInfo::getRegAllocationHints(
Register VirtReg, ArrayRef<MCPhysReg> Order,
SmallVectorImpl<MCPhysReg> &Hints, const MachineFunction &MF,
const VirtRegMap *VRM, const LiveRegMatrix *Matrix) const {
bool Result = TargetRegisterInfo::getRegAllocationHints(VirtReg, Order, Hints,
MF, VRM, Matrix);
erase_if(Hints, [](MCPhysReg Reg) { return Reg == AArch64::LR; });
return Result && !Hints.empty();
}
And `check-llvm` passes (with the updated test from this patch) - that is, both with global-isel disabled and enabled. On the other hand, the `RET_ReallyLR` instruction was introduced on purpose and I am not yet sure if such fix is appropriate.
================
Comment at: llvm/lib/Target/AArch64/AArch64ISelLowering.cpp:9729
+ SDValue ReturnAddressCopy = DAG.getCopyFromReg(Chain, DL, TmpReg, VT);
+ St = DAG.getMachineNode(AArch64::XPACI, DL, VT, ReturnAddressCopy);
} else {
----------------
efriedma wrote:
> We normally shouldn't be creating virtual registers here; isel lowering will create copies when necessary. Maybe this is getting weird because of the use of getMachineNode()? Normally, we shouldn't be using getMachineNode() during ISelLowering; we should use something like `DAG.getNode(AArch64ISD::XPACI, [...]`, and lower that to the actual instruction during ISelDAGToDAG.
Thank you for the suggestion, this reload to another register class looked awkward to me, too.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D156358/new/
https://reviews.llvm.org/D156358
More information about the llvm-commits
mailing list