[PATCH] D158084: [AArch64] Stack probing for function prologues
Momchil Velikov via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Sep 13 07:32:27 PDT 2023
chill added inline comments.
================
Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1855
+ bool NeedsStackProbe = TLI.hasInlineStackProbe(MF) &&
+ (NumBytes > TLI.getStackProbeMaxUnprobedStack(MF) ||
+ MFI.hasVarSizedObjects());
----------------
zatrazz wrote:
> I am not sure if using getStackProbeMaxUnprobedStack as the threshold for emiting static probes is the more optimized value. GCC uses getStackProbeSize - 1024, since with defined invariants (the stack has been probed at [sp, #N] with 0 <= N <= 1024) rest of the guard page can be used without further probing. Afaiu the 1024 was chosen as value to minimize as possible the required probes based on common function stack requirements.
tl;dr We are concerned here with the unprobed stack we leave for the callees.
GCC uses a different frame layout where the unprobed stack space at the moment of the call is adjacent to the space for local variables and the frame record is on top of the stack.
In LLVM we can have:
* leaf function **and** local frame size <= 240 bytes. We don't issue probes in this case (`NeedsStackProbe` above is set to `false`) and our `SP` cannot skip over the guard page (1024 unprobed + 240 locals < 4k).
* non-leaf function (including one calling `noreturn` functions) **or** local frame size >= 256 bytes. In this case LLVM saves at least `x29`, which serves as an implicit probe at `[sp]`. Allocation of space for SVE callee-saved registers and local SVE objects similarly end with probes at `[sp]`. Then we come here to allocate space for fixed-size locals. We cannot allocate more than 1024 bytes without a probe, because that area is adjacent to callee stack frames and we may perform a call without any access to the area, thus violating the invariant.
================
Comment at: llvm/lib/Target/AArch64/AArch64InstrInfo.cpp:8532
+ .addReg(ScratchReg)
+ .addImm(0)
+ .setMIFlags(MachineInstr::FrameSetup);
----------------
zatrazz wrote:
> I don't think this does what the comments above specifies, since on static_4096 testcase the produced code does 'str xzr, [sp]' where it should be 'str xzr, [sp, #1024]'. I think ou should 'TLI->getStackProbeMaxUnprobedStack(MF) /8'.
>
>
> I don't think this does what the comments above specifies, since on static_4096 testcase the produced code does 'str xzr, [sp]'
The comment is incorrect here, I'll fix it.
> where it should be 'str xzr, [sp, #1024]'
It *could* be, in the particular case of `static_4096`, but it's **not** necessary.
Issuing probes at `[sp, #1024]` is problematic, because we don't always know how many bytes we've just allocated and we may be writing to some other area. Also our own internal invariant is that while constructing the frame the stack is probed at `[sp]`, because we start with that invariant anyway (by saving `x29`).
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D158084/new/
https://reviews.llvm.org/D158084
More information about the llvm-commits
mailing list