[PATCH] D61437: [AArch64] Static (de)allocation of SVE stack objects.

Fri May 3 08:52:19 PDT 2019

sdesmalen added a comment.

We've actually experimented with various layouts and eventually chose this layout for our HPC compiler.

Let me give some more clarification on the spill/fill addressing modes as background for this choice.

When loading regular (non-scalable) data from the stack in the presence of SVE stack objects, the base offset can be materialized using ADDVL, which adds a multiple of the runtime VL to a register. For example, a GPR register spilled at an offset `SP + 16 bytes + 2 * sizeof(SVE vector)` can be loaded using the sequence:

  addvl x8, sp, #2
  ldr x0, [x8, #16]

Conversely, the SVE spill/fill addressing modes expect a (runtime) VL scaled offset. For example:

  ldr z0, [sp, #2, mul vl]  // loads z0 from SP + 2 * sizeof(SVE vector)

If we want to load SVE vector `z0` from an offset `SP + 16 bytes + 2 * sizeof(SVE vector)`, this requires first materializing the base offset by adding 16 bytes, and then using the scaled addressing mode to load z0:

  add x8, sp, #16
  ldr z0, [x8, #2, mul vl]

Because the additional `add <fixed-size offset>`, or alternatively `addvl <scalable offset>` is expensive, we distinguish fixed-size objects and scalable (SVE) objects in different regions. By allocating the SVE region before all other stack objects (CSRs, spills, locals), we benefit that the existing frame-layout doesn't need to change. More importantly, this means that accesses to almost all fixed-size stack objects (with exception of stack arguments) will be as efficient as they would be without SVE stack objects, and don't require an extra frame register. In the presence of a frame-pointer, we can also benefit from accessing our SVE objects directly from the FP.

> 1. How do you compute the address of a stack argument?

We can compute the address of a stack argument using 'addvl' and regular 'add/sub' instructions.

> 2. Under the ios and Windows calling conventions, vararg functions must allocate some fixed slots directly after the stack arguments.

I don't think there is a reason this decision would prevent the ability to create fixed slots directly after stack arguments (with some work to support it for these calling conventions, of course), although I admit we have not had to concern ourselves with this case for our HPC compiler which implements the AAPCS (with SVE extensions). I don't know enough about the iOS and Windows calling conventions to know if there are explicit assumptions made on the frame-layout other than this?

> 3. How do you restore SP in the epilogue?

We restore the SP in the epilogue by adding the scalable stack-size to the SP as a last step. For example (from test/CodeGen/AArch64/framelayout-sve.mir)

  # CHECK-NEXT: $sp = frame-setup ADDVL_XXI $sp, -2    // allocate scalable-sized stack
  # CHECK-NEXT: $sp = frame-setup SUBXri $sp, 16, 0    // allocate fixed-size stack

  # CHECK:      $sp = frame-destroy ADDXri $sp, 16, 0  // deallocate fixed-size stack
  # CHECK-NEXT: $sp = frame-destroy ADDVL_XXI $sp, 2   // deallocate scalable-sized stack

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61437/new/

https://reviews.llvm.org/D61437