[llvm] r296771 - [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size

Tim Shen via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 8 11:29:19 PST 2017


On Wed, Mar 8, 2017, 03:18 Hal Finkel <hfinkel at anl.gov> wrote:


On 03/07/2017 08:43 PM, Tim Shen via llvm-commits wrote:

On Tue, Mar 7, 2017 at 6:21 PM Tim Shen <timshen at google.com> wrote:

I found the problem. In the patch, there is a check for isVarArg:
  bool HasParameterArea = !isELFv2ABI || isVarArg || CallConv ==
CallingConv::Fast;

In our internal case, the callee is indeed a var arg function, but the call
is through a function pointer. I think this optimization should be off for
indirect call as well.


Oops, we had a C cast from a var arg function pointer into a non-var-arg
function pointer in the source. We deserve the UB :/.


Do you think that we should make Clang warn on this case? A user might not
realize that on some targets this is an ABI-incompatible change.


Yes, we might be able to do that.

A normal usage of casting a function pointer is type erasure. In that case,
we can suggest the user to cast to void* if possible (*). For
function-pointer to function-pointer cast, issue a warning.

CC Richard.

* Casting a function pointer from and to data pointer is implementation
defined: http://en.cppreference.com/w/cpp/language/reinterpret_cast


 -Hal



For this failure it's not the patch's fault. I'll revert the revert and
test it again.

Sorry for the disturbance!



On Tue, Mar 7, 2017 at 2:37 PM Tim Shen <timshen at google.com> wrote:

Update:

The failure mode is that, malloc(10) returns a bogus address,
like 0x3fff9cc60428. Normal addresses are like 0x1003a00b980. Here malloc
uses TCMalloc.

Still looking.

On Tue, Mar 7, 2017 at 9:40 AM Tim Shen <timshen at google.com> wrote:

In an internal function, I found the a very early instruction modifies the
first parameter (r3):
  ld r3,-31576(r2)

which immediately clobbers r3. Succeeding operations on r3 causes a crash.

My guess would be this function assumes the parameters are on the stack,
while the caller doesn't assume so.

Still investigating.

On Tue, Mar 7, 2017 at 1:59 AM Nemanja Ivanovic via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

Thanks Daniel and Tim,

It's certainly a curious thing that this occurs. I am just curious, does
this happen at noopt or with optimization? The only thing I can think of is
that maybe this stack frame reduction isn't valid if mem2reg isn't run. But
this is more or less a total shot in the dark.

Nemanja.

On Tue, Mar 7, 2017 at 9:20 AM, Daniel Jasper <djasper at google.com> wrote:

Just FYI, Tim has reverted this patch and is looking at reducing a test
case. He should have one tomorrow morning.

So far, he found that the first parameter is live but clobbered immediately
in some case.
Assembly:
ld      r3,-31576(r2)

(I don't know what that means, but just wanted to provide as much
information as I can, in case that makes it easy for you to spot the bug)

Cheers,
Danie

On Tue, Mar 7, 2017 at 7:48 AM, Daniel Jasper <djasper at google.com> wrote:

This seems to be causing some miscompiles. Working on a test case.

On Thu, Mar 2, 2017 at 6:39 PM, Nemanja Ivanovic via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

Author: nemanjai
Date: Thu Mar  2 11:38:59 2017
New Revision: 296771

URL: http://llvm.org/viewvc/llvm-project?rev=296771&view=rev
Log:
[PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame
size

This patch reduces the stack frame size by not allocating the parameter
area if
it is not required. In the current implementation
LowerFormalArguments_64SVR4
already handles the parameter area, but LowerCall_64SVR4 does not
(when calculating the stack frame size). What this patch does is make
LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4.

Committing on behalf of Hiroshi Inoue.

Differential Revision: https://reviews.llvm.org/D29881

Added:
    llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
Modified:
    llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp

Modified: llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
URL:
http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp?rev=296771&r1=296770&r2=296771&view=diff
==============================================================================
--- llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp (original)
+++ llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp Thu Mar  2 11:38:59
2017
@@ -5147,10 +5147,30 @@ SDValue PPCTargetLowering::LowerCall_64S
   };

   const unsigned NumGPRs = array_lengthof(GPR);
-  const unsigned NumFPRs = 13;
+  const unsigned NumFPRs = useSoftFloat() ? 0 : 13;
   const unsigned NumVRs  = array_lengthof(VR);
   const unsigned NumQFPRs = NumFPRs;

+  // On ELFv2, we can avoid allocating the parameter area if all the
arguments
+  // can be passed to the callee in registers.
+  // For the fast calling convention, there is another check below.
+  // Note: keep consistent with LowerFormalArguments_64SVR4()
+  bool HasParameterArea = !isELFv2ABI || isVarArg || CallConv ==
CallingConv::Fast;
+  if (!HasParameterArea) {
+    unsigned ParamAreaSize = NumGPRs * PtrByteSize;
+    unsigned AvailableFPRs = NumFPRs;
+    unsigned AvailableVRs = NumVRs;
+    unsigned NumBytesTmp = NumBytes;
+    for (unsigned i = 0; i != NumOps; ++i) {
+      if (Outs[i].Flags.isNest()) continue;
+      if (CalculateStackSlotUsed(Outs[i].VT, Outs[i].ArgVT, Outs[i].Flags,
+                                PtrByteSize, LinkageSize, ParamAreaSize,
+                                NumBytesTmp, AvailableFPRs, AvailableVRs,
+                                Subtarget.hasQPX()))
+        HasParameterArea = true;
+    }
+  }
+
   // When using the fast calling convention, we don't provide backing for
   // arguments that will be in registers.
   unsigned NumGPRsUsed = 0, NumFPRsUsed = 0, NumVRsUsed = 0;
@@ -5218,13 +5238,18 @@ SDValue PPCTargetLowering::LowerCall_64S

   unsigned NumBytesActuallyUsed = NumBytes;

-  // The prolog code of the callee may store up to 8 GPR argument
registers to
+  // In the old ELFv1 ABI,
+  // the prolog code of the callee may store up to 8 GPR argument
registers to
   // the stack, allowing va_start to index over them in memory if its
varargs.
   // Because we cannot tell if this is needed on the caller side, we have
to
   // conservatively assume that it is needed.  As such, make sure we have
at
   // least enough stack space for the caller to store the 8 GPRs.
-  // FIXME: On ELFv2, it may be unnecessary to allocate the parameter area.
-  NumBytes = std::max(NumBytes, LinkageSize + 8 * PtrByteSize);
+  // In the ELFv2 ABI, we allocate the parameter area iff a callee
+  // really requires memory operands, e.g. a vararg function.
+  if (HasParameterArea)
+    NumBytes = std::max(NumBytes, LinkageSize + 8 * PtrByteSize);
+  else
+    NumBytes = LinkageSize;

   // Tail call needs the stack to be aligned.
   if (getTargetMachine().Options.GuaranteedTailCallOpt &&
@@ -5443,6 +5468,8 @@ SDValue PPCTargetLowering::LowerCall_64S
         if (CallConv == CallingConv::Fast)
           ComputePtrOff();

+        assert(HasParameterArea &&
+               "Parameter area must exist to pass an argument in memory.");
         LowerMemOpCallTo(DAG, MF, Chain, Arg, PtrOff, SPDiff, ArgOffset,
                          true, isTailCall, false, MemOpChains,
                          TailCallArguments, dl);
@@ -5528,6 +5555,8 @@ SDValue PPCTargetLowering::LowerCall_64S
           PtrOff = DAG.getNode(ISD::ADD, dl, PtrVT, PtrOff, ConstFour);
         }

+        assert(HasParameterArea &&
+               "Parameter area must exist to pass an argument in memory.");
         LowerMemOpCallTo(DAG, MF, Chain, Arg, PtrOff, SPDiff, ArgOffset,
                          true, isTailCall, false, MemOpChains,
                          TailCallArguments, dl);
@@ -5562,6 +5591,8 @@ SDValue PPCTargetLowering::LowerCall_64S
       // GPRs when within range.  For now, we always put the value in both
       // locations (or even all three).
       if (isVarArg) {
+        assert(HasParameterArea &&
+               "Parameter area must exist if we have a varargs call.");
         // We could elide this store in the case where the object fits
         // entirely in R registers.  Maybe later.
         SDValue Store =
@@ -5594,6 +5625,8 @@ SDValue PPCTargetLowering::LowerCall_64S
         if (CallConv == CallingConv::Fast)
           ComputePtrOff();

+        assert(HasParameterArea &&
+               "Parameter area must exist to pass an argument in memory.");
         LowerMemOpCallTo(DAG, MF, Chain, Arg, PtrOff, SPDiff, ArgOffset,
                          true, isTailCall, true, MemOpChains,
                          TailCallArguments, dl);
@@ -5614,6 +5647,8 @@ SDValue PPCTargetLowering::LowerCall_64S
     case MVT::v4i1: {
       bool IsF32 = Arg.getValueType().getSimpleVT().SimpleTy == MVT::v4f32;
       if (isVarArg) {
+        assert(HasParameterArea &&
+               "Parameter area must exist if we have a varargs call.");
         // We could elide this store in the case where the object fits
         // entirely in R registers.  Maybe later.
         SDValue Store =
@@ -5646,6 +5681,8 @@ SDValue PPCTargetLowering::LowerCall_64S
         if (CallConv == CallingConv::Fast)
           ComputePtrOff();

+        assert(HasParameterArea &&
+               "Parameter area must exist to pass an argument in memory.");
         LowerMemOpCallTo(DAG, MF, Chain, Arg, PtrOff, SPDiff, ArgOffset,
                          true, isTailCall, true, MemOpChains,
                          TailCallArguments, dl);
@@ -5660,7 +5697,8 @@ SDValue PPCTargetLowering::LowerCall_64S
     }
   }

-  assert(NumBytesActuallyUsed == ArgOffset);
+  assert((!HasParameterArea || NumBytesActuallyUsed == ArgOffset) &&
+         "mismatch in size of parameter area");
   (void)NumBytesActuallyUsed;

   if (!MemOpChains.empty())

Added: llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
URL:
http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/stacksize.ll?rev=296771&view=auto
==============================================================================
--- llvm/trunk/test/CodeGen/PowerPC/stacksize.ll (added)
+++ llvm/trunk/test/CodeGen/PowerPC/stacksize.ll Thu Mar  2 11:38:59 2017
@@ -0,0 +1,86 @@
+; For ELFv2 ABI, we can avoid allocating the parameter area in the stack
frame of the caller function
+; if all the arguments can be passed to the callee in registers.
+; For ELFv1 ABI, we always need to allocate the parameter area.
+
+; Tests for ELFv2 ABI
+; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu
-target-abi elfv2 < %s | FileCheck %s -check-prefix=PPC64-ELFV2
+; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu
-target-abi elfv2 < %s | FileCheck %s -check-prefix=PPC64-ELFV2
+
+; Tests for ELFv1 ABI
+; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu
-target-abi elfv1 < %s | FileCheck %s -check-prefix=PPC64-ELFV1
+; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu
-target-abi elfv1 < %s | FileCheck %s -check-prefix=PPC64-ELFV1
+
+; If the callee has at most eight integer args, parameter area can be
ommited for ELFv2 ABI.
+
+; PPC64-ELFV2-LABEL: WithoutParamArea1:
+; PPC64-ELFV2-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV2: stdu 1, -32(1)
+; PPC64-ELFV2: addi 1, 1, 32
+; PPC64-ELFV2-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV1-LABEL: WithoutParamArea1:
+; PPC64-ELFV1-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV1: stdu 1, -112(1)
+; PPC64-ELFV1: addi 1, 1, 112
+; PPC64-ELFV1-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
+define signext i32 @WithoutParamArea1(i32 signext %a) local_unnamed_addr
#0 {
+entry:
+  %call = tail call signext i32 @onearg(i32 signext %a) #2
+  ret i32 %call
+}
+
+; PPC64-ELFV2-LABEL: WithoutParamArea2:
+; PPC64-ELFV2-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV2: stdu 1, -32(1)
+; PPC64-ELFV2: addi 1, 1, 32
+; PPC64-ELFV2-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV1-LABEL: WithoutParamArea2:
+; PPC64-ELFV1-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV1: stdu 1, -112(1)
+; PPC64-ELFV1: addi 1, 1, 112
+; PPC64-ELFV1-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
+define signext i32 @WithoutParamArea2(i32 signext %a) local_unnamed_addr
#0 {
+entry:
+  %call = tail call signext i32 @eightargs(i32 signext %a, i32 signext %a,
i32 signext %a, i32 signext %a, i32 signext %a, i32 signext %a, i32 signext
%a, i32 signext %a) #2
+  ret i32 %call
+}
+
+; If the callee has more than eight integer args or variable number of
args,
+; parameter area cannot be ommited even for ELFv2 ABI
+
+; PPC64-ELFV2-LABEL: WithParamArea1:
+; PPC64-ELFV2-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV2: stdu 1, -96(1)
+; PPC64-ELFV2: addi 1, 1, 96
+; PPC64-ELFV2-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV1-LABEL: WithParamArea1:
+; PPC64-ELFV1-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV1: stdu 1, -112(1)
+; PPC64-ELFV1: addi 1, 1, 112
+; PPC64-ELFV1-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
+define signext i32 @WithParamArea1(i32 signext %a) local_unnamed_addr #0 {
+entry:
+  %call = tail call signext i32 (i32, ...) @varargs(i32 signext %a, i32
signext %a) #2
+  ret i32 %call
+}
+
+; PPC64-ELFV2-LABEL: WithParamArea2:
+; PPC64-ELFV2-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV2: stdu 1, -112(1)
+; PPC64-ELFV2: addi 1, 1, 112
+; PPC64-ELFV2-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV1-LABEL: WithParamArea2:
+; PPC64-ELFV1-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
+; PPC64-ELFV1: stdu 1, -128(1)
+; PPC64-ELFV1: addi 1, 1, 128
+; PPC64-ELFV1-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
+define signext i32 @WithParamArea2(i32 signext %a) local_unnamed_addr #0 {
+entry:
+  %call = tail call signext i32 @nineargs(i32 signext %a, i32 signext %a,
i32 signext %a, i32 signext %a, i32 signext %a, i32 signext %a, i32 signext
%a, i32 signext %a, i32 signext %a) #2
+  ret i32 %call
+}
+
+declare signext i32 @onearg(i32 signext) local_unnamed_addr #1
+declare signext i32 @eightargs(i32 signext, i32 signext, i32 signext, i32
signext, i32 signext, i32 signext, i32 signext, i32 signext)
local_unnamed_addr #1
+declare signext i32 @nineargs(i32 signext, i32 signext, i32 signext, i32
signext, i32 signext, i32 signext, i32 signext, i32 signext, i32 signext)
local_unnamed_addr #1
+declare signext i32 @varargs(i32 signext, ...) local_unnamed_addr #1
+


_______________________________________________
llvm-commits mailing list
llvm-commits at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits




_______________________________________________
llvm-commits mailing list
llvm-commits at lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits



_______________________________________________
llvm-commits mailing
listllvm-commits at lists.llvm.orghttp://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits


-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170308/1474fa31/attachment.html>


More information about the llvm-commits mailing list