[PATCH] D114457: [z/OS] Implement prologue and epilogue generation for z/OS target.

Mon Dec 6 06:33:23 PST 2021

uweigand added a comment.

Pulling out the `getFrameIndexReference` discussion into the main comments.

First of all, let me review the overall stack layout with the z/OS XPLINK calling convention.   To my knowledge, the stack looks basically like this (from high addresses to low addresses):

               ========================================================
               Caller's local variables + spill area
               --------------------------------------------------------
               Caller's argument area (passed to callee)
  SP1 + 2176   --------------------------------------------------------
               Caller's 128-byte GPR save area
  SP1 + 2048   ========================================================
               Callee's local variables + spill area
               --------------------------------------------------------
               (unless leaf) Callee's argument area 
               --------------------------------------------------------
               (unless leaf) Callee's 128-byte GPR save area
  SP2 + 2048   ========================================================

  SP1:  Stack pointer at function entry to callee (beginning of prologue)
  SP2:  Stack pointer during callee (end of prologue)

  SP1 - SP2 == Callee's frame size

Is this correct?   If not, please correct.   (In any case, it would be good to have a diagram along those lines as comments in this file as well.)

The first question is, what is the //base// to be used for frame references?  There is some flexibility, but it seems most useful to use the incoming SP, modulo the stack pointer bias, for that (i.e. the value "SP1 + 2048" in the above diagram). This would mean that you access the incoming parameters using positive offsets relative to that base, and local variables using negative offsets relative to that base.

The `getFrameIndexReference` routine needs to compute that base value (and then add the offset) starting from current register values, i.e. SP2 in the above diagram.   The default computation (done by the default implementation of the routine) does:

  return StackOffset::getFixed(MFI.getObjectOffset(FI) + MFI.getStackSize() -
                               getOffsetOfLocalArea() +
                               MFI.getOffsetAdjustment());

Your proposed platform-specific implementation does instead:

  return StackOffset::getFixed(MFI.getObjectOffset(FI) + getAllocatedStackSize(MF) +
                               getOffsetOfLocalArea() +
                               MFI.getOffsetAdjustment());

There's two differences:
1.) you use `getAllocatedStackSize` instead of `MFI.getStackSize`
2.) you add instead of subtract the `getOffsetOfLocalArea` value

Both of these look suspicious to me.

As to the first difference, the difference between `getAllocatedStackSize` and `MFI.getStackSize` is that the former adds the 128 bytes that are used for the (callee's) GPR save area, which wasn't part of the stack frame size before.  I think it would be preferable to actually **add** that size to the stack size, so it will be included in `MFI.getStackSize` to begin with.  This would also help other users of that routine.

To do so, you should simply call `MFFrame.setStackSize` in the `emitPrologue` routine, similar to what is already done in the ELF ABI case.

As to the second difference, I do not really understand how the local area offset is currently being used by the XPLINK ABI.  The constructor currently sets this up as:

  SystemZXPLINKFrameLowering::SystemZXPLINKFrameLowering()
      : SystemZFrameLowering(TargetFrameLowering::StackGrowsUp, Align(32), 128,
                             Align(32), /* StackRealignable */ false),

So we have an **upwards** growing stack with a local area offset of 128 bytes.  This seems wrong.  First of all, the stack is actually **downwards** growing - note how `emitPrologue` **subtracts** from the stack pointer (for an upwards growing stack it would add to the stack pointer).

So one change might be to use StackGrowsDown instead, and then use -128 instead of 128 for the local area offset.  (With StackGrowsDown, the offset must be negative.)   That would already fix the sign problem your code works around.

However, I think even that is really incorrect.  I don't believe in the above diagram we have anything that would match the "local area offset" as used by common code here -- that offset would refer to some fixed members placed between the base of the frame (i.e. SP1 + 2048 as discussed above) and the local variable/spill area -- but there are no fixed members in that place.   So I think this really should be 0 anyway.

Maybe the intent was to use the top of the caller's GPR save area as the frame base instead?  Then the 128 byte would account for that incoming space.  But I'm not sure this has any particular benefit.  (@kai, any comments on that?)

Note that the parameter set up code in `LowerFormalParameters` et. al.  currently also assumes that the frame offsets for incoming parameters are relative to the top of the caller's GPR save area, and not the incoming SP.   So depending on the decision what point to use as frame base, we may need to update how those offsets are being computed as well.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114457/new/

https://reviews.llvm.org/D114457