[PATCH] D70821: [SystemZ] Implement the packed stack layout

Tue Dec 3 04:50:35 PST 2019

uweigand added a comment.

> processFunctionBeforeFrameFinalized() should now be correct but it is not giving the exact same results on SPEC.

I believe the original processFunctionBeforeFrameFinalized computation was correct:

  // Get the size of our stack frame to be allocated ...
  uint64_t StackSize = (MFFrame.estimateStackSize(MF) +
                        SystemZMC::CallFrameSize);

At this point, estimateStackSize is supposed to return the size of the objects allocated in the local area, **excluding** both incoming and outgoing register save areas.   We add back the size of the outgoing register save area (which we are certain to need in all nontrivial cases).

> MFFrame.estimateStackSize(MF) adds 56 to the stack size based on the -56 offset for %R13 <https://reviews.llvm.org/source/pstl/>. This shouldn't be done, since CallFrameSize is added later...

This should not have happened in the old code because the 56 was within the "LocalAreaOffset" and therefore did not count.  It is true that with the new logic this will have to change now ...

  // ... and the maximum offset we may need to reach into the
   // caller's frame to access the save area or stack arguments.
   int64_t MaxArgOffset = SystemZMC::CallFrameSize;
   for (int I = MFFrame.getObjectIndexBegin(); I != 0; ++I)
     if (MFFrame.getObjectOffset(I) >= 0) {
       int64_t ArgOffset = SystemZMC::CallFrameSize +
                           MFFrame.getObjectOffset(I) +
                           MFFrame.getObjectSize(I);
       MaxArgOffset = std::max(MaxArgOffset, ArgOffset);
     }

Given the above, we know current SP + StackSize suffices to reach the bottom of the incoming register save area.  MaxArgOffset is now supposed to cover everything we may need to reach **beyond** that.  This includes everything within the incoming register save are (as we might need to restore from there), and all the incoming arguments.

Now, the difference with the new logic is that estimateStackSize will include the **incoming** register save area already (all of it in the default case, the parts that are used in the -mpacked-stack case).  Therefore, we need to change the computation of MaxArgOffset, exactly as your patch already does.  Everything else should be the same.

Therefore, I'm not sure why you're seeing any difference in results -- it should be exactly the same.

================
Comment at: llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp:51
+                          0, Align(8), false /* StackRealignable */),
+      RegSpillOffsets(0) {
   // Due to the SystemZ ABI, the DWARF CFA (Canonical Frame Address) is not
----------------
Why is the RegSpillOffsets change above needed?

================
Comment at: llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp:63
     RegSpillOffsets[SpillOffsetTable[I].Reg] =
       SystemZMC::CallFrameSize + SpillOffsetTable[I].Offset;
 }
----------------
Since this is now the only user of the SpillOffsetTable, we can simplify it again.

================
Comment at: llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp:77
+  unsigned HighGPR = SystemZ::R15D;
+  int StartAdjust = ZFI->usePackedStack(MF) ? SystemZMC::FPArgRegSaveAreaSize : 0;
+  int StartSPOffset = SystemZMC::CallFrameSize + StartAdjust;
----------------
This seems strange.   The packed-stack case should ignore all predefined offsets and just place the GPRs that need to be saved at the top of the area.

================
Comment at: llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp:81
+    unsigned Reg = CS.getReg();
+    int Offset = RegSpillOffsets[Reg];
+    if (Offset) {
----------------
The RegSpillOffsets value is only ever valid for the non-packed case, so I think everything would be clearer if you only looked at it in that case.

================
Comment at: llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp:134

+  SystemZMachineFunctionInfo *ZFI = MF.getInfo<SystemZMachineFunctionInfo>();
   MachineFrameInfo &MFFrame = MF.getFrameInfo();
----------------
There's already MFI below?

================
Comment at: llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp:349
   // Get the size of our stack frame to be allocated ...
-  uint64_t StackSize = (MFFrame.estimateStackSize(MF) +
-                        SystemZMC::CallFrameSize);
+  uint64_t StackSize = MFFrame.estimateStackSize(MF);
   // ... and the maximum offset we may need to reach into the
----------------
Moving the + SystemZMC::CallFrameSize down seems a no-op and doesn't really make anything clearer IMO ...

================
Comment at: llvm/lib/Target/SystemZ/SystemZFrameLowering.cpp:459
   uint64_t StackSize = MFFrame.getStackSize();
+  bool OtherStackThanRegSaveArea = StackSize > ZFI->getRegSaveAreaUsage();
   // We need to allocate the ABI-defined 160-byte base area whenever
----------------
It would be nice if we could avoid the duplicate bookkeeping in RegSaveAreaUsage here.

What we really want to check is whether there is any non-dead, non-fixed object on the stack or not.   Maybe something along those lines:

```
  bool haveStackObject = false;
  for (unsigned i = 0, e = MFI.getObjectIndexEnd(); i != e; ++i) {
    if (!MFI.isDeadObjectIndex(i)) {
      HaveStackObject = true;
      break;
    }

```

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D70821/new/

https://reviews.llvm.org/D70821