[PATCH] D96004: [AArch64] Stack probing for function prologues

Tue Mar 16 04:29:01 PDT 2021

tnfchris added a comment.

Hi @ostannard ,

I don't know enough about LLVM to comment on the actual code so I will only comment on the output I see generated from the testcases.

>From the testcases (like `static_1024`) I can see that you probe when there is more than `1k` of incoming stack arguments.
For GCC this is `guard-page - 1k`.  The reasoning is that with any outgoing argument larger than `1k` we would probe such that we maintain the invariant, but probing that `1k` means we have a whole `guard-size -1k` left that we can use without probing.  These sizes were chose as they cover about 99% of all programs (for a subset of all :)).

So the idea is to minimize the number of probes required.

As such for this is what GCC generates for these cases:

  int probe (int x)
  {
    char arr[64 * 1028];
    return arr[x];
  }

  probe:
          sub     sp, sp, #65536
          str     xzr, [sp, 1024]
          sub     sp, sp, #256
          ldrb    w0, [sp, w0, sxtw]
          add     sp, sp, 256
          add     sp, sp, 65536
          ret

  int no_probe (int x)
  {
    char arr[1028];
    return arr[x];
  }

  no_probe:
          sub     sp, sp, #1040
          add     x1, sp, 8
          ldrb    w0, [x1, w0, sxtw]
          add     sp, sp, 1040
          ret

For 64k probe sizes.

The other difference is where we probe as well.  You seem to be probing at `SP` but we probe at `SP + 1k`.

This is because say you were at the 1k boundary and you allocate 1 guard size worth of incoming args you could a page. So we probe the 1k up to ensure you touch the pages as you go.

For the alloca cases,

I noticed you don't have a testcase for `alloc(n)` where `n` is a variable.  Also how does it handle `alloca(0)`?

I also notice that you only set the CFA after the loop has finished.

In GCC we temporarily change the CFI to a different register and set it to the final expected value after the loop.
After the loop we switch it back, so we say which value it's going to be before hand.

i.e.

  .LFB0:
          .cfi_startproc
          sub     x12, sp, #1310720
          .cfi_def_cfa 12, 1310720
  .LPSRL0:
          sub     sp, sp, 65536
          str     xzr, [sp, 1024]
          cmp     sp, x12
          b.ne    .LPSRL0
          .cfi_def_cfa_register 31
          sub     sp, sp, #512
          .cfi_def_cfa_offset 1311232
          ldrb    w0, [sp, w0, sxtw]
          add     sp, sp, 512
          .cfi_def_cfa_offset 1310720
          add     sp, sp, 1310720
          .cfi_def_cfa_offset 0
          ret
          .cfi_endproc
  .LFE0:

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96004/new/

https://reviews.llvm.org/D96004