[PATCH] D96004: [AArch64] Stack probing for function prologues

Wed Aug 16 07:39:53 PDT 2023

chill added a comment.

I've started a new review of this patch here: https://reviews.llvm.org/D158084
I believe I've addressed most of the comments in this review (except one?).

================
Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1351

-  StackOffset AllocateBefore = SVEStackSize, AllocateAfter = {};
+  StackOffset AllocateBefore = {}, AllocateAfter = SVEStackSize;
   MachineBasicBlock::iterator CalleeSavesBegin = MBBI, CalleeSavesEnd = MBBI;
----------------
kristof.beyls wrote:
> kristof.beyls wrote:
> > I know this is not your original code, but I'd be tempted to go for more descriptive names than AllocateBefore and AllocateAfter. It's been a little while since I looked at the FrameLowering code, and from the names alone, I could not guess what these stack offsets were meant to represent.
> > Maybe I'll have a suggestion for a better name later on after I've read more of the patch.
> > 
> > I think it also makes it easier to read this code if there was a comment above this that describes that this part of the code handles SVE CSR and SVE locals (i.e. the SVE area of the stack frame).
> I'm a bit confused by this code change (probably partly due to the not really understanding what these variables are meant to represent).
> Does the change on this line indicate that there was a bug here before? If so, would it be better to commit that separately?
> If not, does the code change result in the meaning of AllocateBefore and AllocateAfter to change?
I've added more descriptive names in the new patch.

As far as I can tell, there's no bug, just the code is a little  convoluted. I've rearranged it is the new patch (could just as well be a separate NFC/refactoring patch).

================
Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1382
+  if (TLI.hasInlineStackProbe(MF) && AllocateAfter) {
+    Register ScratchReg = findScratchNonCalleeSaveRegister(&MBB);
+    assert(ScratchReg != AArch64::NoRegister);
----------------
cuviper wrote:
> nagisa wrote:
> > kristof.beyls wrote:
> > > This line of code reminds me that frame setup can also happen in non-entry basic blocks, when shrink wrapping has happened.
> > > Assuming I checked correctly, there are no test cases in the tests in this patch verifying correct behavior in case shrink wrapping happens. I'm not sure if it's worthwhile to have such tests, but I thought I'd check if you thought about this.
> > I believe this would fail or produce incorrect results for functions that both have the `no_caller_saved_registers` attribute and request the inline stack probes, right?
> > This line of code reminds me that frame setup can also happen in non-entry basic blocks, when shrink wrapping has happened.
> 
> FWIW, `TargetFrameLowering::canUseAsPrologue` can be used to block that if needed.
> (I just used that to fix an X86 bug where stack probes clobbered EFLAGS, D134494.)
> 
> 
In the new patch I've added a check for an available scratch register in `canUseAsPrologue`.

================
Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.cpp:1398-1400
+    bool NeedsStackProbe = TLI.hasInlineStackProbe(MF) &&
+                           (NumBytes >= TLI.getStackProbeMaxUnprobedStack(MF) ||
+                            MFI.hasVarSizedObjects());
----------------
kristof.beyls wrote:
> This seems to be a key heuristic/logic on when to generate stack probes and when not.
> I think it would be useful to have some light design documentation/rationale on why this is the right heuristic.
> Depending on the length of that documentation, it could go either here, in the large comment at the top of this file or somewhere else (potentially as a target-independent design doc in the docs directory)? I don't think the design doc has to be particularly long, but a small doc probably will bring a lot of value for future developers needing to touch this code.
TBH, the main reason for this logic is compatibility with GCC (let's call it an informal ABI).
 GCC *expects* at most 1024 unprobed bytes above the stack, hence if we allocate more than that we need to issue probe.

The choice of the number 1024 comes from an analysis of SPEC frame size distributions and was chosen such that a great number of stack frames (of size < 3k or so, for 4k pages)  won't need to perform a probe at all (for GCC).

As far as I can understand, these considerations are not really relevant for LLVM, since it stores `x29` for frames, greater than 240 bytes anyway.

================
Comment at: llvm/lib/Target/AArch64/AArch64FrameLowering.h:153
+  /// Replace a StackProbe stub (if any) with the actual probe code inline
+  virtual void inlineStackProbe(MachineFunction &MF,
+                                MachineBasicBlock &PrologueMBB) const override;
----------------
kristof.beyls wrote:
> ostannard wrote:
> > serge-sans-paille wrote:
> > > Why did you set that one virtual?
> > It's an override of a virtual function called by the target-independent part of the prologue/epilogue pass.
> I think the keyword virtual is not needed when override is specified.
> There are other examples of override methods in this class; they do not explicitly write virtual. So I guess it's more consistent to not use virtual here?
Keyword `virtual` removed.

================
Comment at: llvm/test/CodeGen/AArch64/stack-probing-64k.ll:31
+; CHECK-NEXT:    str x29, [sp, #256] // 8-byte Folded Spill
+; CHECK-NEXT:    .cfi_def_cfa_offset 272
+; CHECK-NEXT:    .cfi_offset w29, -16
----------------
nagisa wrote:
> This //should// come directly after the `sub` I believe. Otherwise the stack offsets would be incorrect while the `str x29, [sp, #256]` is being executed.
> 
> (Not sure if preexisting, though)
This patch predates the support for asynchronous unwinding. In the new patch it's handled correctly, I believe.

================
Comment at: llvm/test/CodeGen/AArch64/stack-probing-64k.ll:32
+; CHECK-NEXT:    .cfi_def_cfa_offset 272
+; CHECK-NEXT:    .cfi_offset w29, -16
+; CHECK-NEXT:    mov x8, sp
----------------
nagisa wrote:
> Is this right? We spilled 8 bytes, but specify the offset for the 32-bit view of the register.
Yes, it's correct, the DWARF encoding of register names does not encode separately `xN` and `wN`.

================
Comment at: llvm/test/CodeGen/AArch64/stack-probing-64k.ll:71
+; CHECK-NEXT:    str xzr, [sp]
+; CHECK-NEXT:    .cfi_def_cfa_offset 1040
+; CHECK-NEXT:    .cfi_offset w29, -16
----------------
nagisa wrote:
> Similarly, this should come immediately after the `sub` instruction, otherwise the CFI won't describe the stack accurately during the `str` above.
Also fixed.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96004/new/

https://reviews.llvm.org/D96004