[llvm] r296771 - [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size

Hal Finkel via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 8 03:15:14 PST 2017


On 03/07/2017 08:43 PM, Tim Shen via llvm-commits wrote:
> On Tue, Mar 7, 2017 at 6:21 PM Tim Shen <timshen at google.com 
> <mailto:timshen at google.com>> wrote:
>
>     I found the problem. In the patch, there is a check for isVarArg:
>       bool HasParameterArea = !isELFv2ABI || isVarArg || CallConv ==
>     CallingConv::Fast;
>
>     In our internal case, the callee is indeed a var arg function, but
>     the call is through a function pointer. I think this optimization
>     should be off for indirect call as well.
>
>
> Oops, we had a C cast from a var arg function pointer into a 
> non-var-arg function pointer in the source. We deserve the UB :/.

Do you think that we should make Clang warn on this case? A user might 
not realize that on some targets this is an ABI-incompatible change.

  -Hal

>
> For this failure it's not the patch's fault. I'll revert the revert 
> and test it again.
>
> Sorry for the disturbance!
>
>
>     On Tue, Mar 7, 2017 at 2:37 PM Tim Shen <timshen at google.com
>     <mailto:timshen at google.com>> wrote:
>
>         Update:
>
>         The failure mode is that, malloc(10) returns a bogus address,
>         like 0x3fff9cc60428. Normal addresses are like 0x1003a00b980.
>         Here malloc uses TCMalloc.
>
>         Still looking.
>
>         On Tue, Mar 7, 2017 at 9:40 AM Tim Shen <timshen at google.com
>         <mailto:timshen at google.com>> wrote:
>
>             In an internal function, I found the a very early
>             instruction modifies the first parameter (r3):
>               ld r3,-31576(r2)
>
>             which immediately clobbers r3. Succeeding operations on r3
>             causes a crash.
>
>             My guess would be this function assumes the parameters are
>             on the stack, while the caller doesn't assume so.
>
>             Still investigating.
>
>             On Tue, Mar 7, 2017 at 1:59 AM Nemanja Ivanovic via
>             llvm-commits <llvm-commits at lists.llvm.org
>             <mailto:llvm-commits at lists.llvm.org>> wrote:
>
>                 Thanks Daniel and Tim,
>
>                 It's certainly a curious thing that this occurs. I am
>                 just curious, does this happen at noopt or with
>                 optimization? The only thing I can think of is that
>                 maybe this stack frame reduction isn't valid if
>                 mem2reg isn't run. But this is more or less a total
>                 shot in the dark.
>
>                 Nemanja.
>
>                 On Tue, Mar 7, 2017 at 9:20 AM, Daniel Jasper
>                 <djasper at google.com <mailto:djasper at google.com>> wrote:
>
>                     Just FYI, Tim has reverted this patch and is
>                     looking at reducing a test case. He should have
>                     one tomorrow morning.
>
>                     So far, he found that the first parameter is live
>                     but clobbered immediately in some case.
>                     Assembly:
>                     ld    r3,-31576(r2)
>
>                     (I don't know what that means, but just wanted to
>                     provide as much information as I can, in case that
>                     makes it easy for you to spot the bug)
>
>                     Cheers,
>                     Danie
>
>                     On Tue, Mar 7, 2017 at 7:48 AM, Daniel Jasper
>                     <djasper at google.com <mailto:djasper at google.com>>
>                     wrote:
>
>                         This seems to be causing some miscompiles.
>                         Working on a test case.
>
>                         On Thu, Mar 2, 2017 at 6:39 PM, Nemanja
>                         Ivanovic via llvm-commits
>                         <llvm-commits at lists.llvm.org
>                         <mailto:llvm-commits at lists.llvm.org>> wrote:
>
>                             Author: nemanjai
>                             Date: Thu Mar  2 11:38:59 2017
>                             New Revision: 296771
>
>                             URL:
>                             http://llvm.org/viewvc/llvm-project?rev=296771&view=rev
>                             Log:
>                             [PowerPC][ELFv2ABI] Allocate parameter
>                             area on-demand to reduce stack frame size
>
>                             This patch reduces the stack frame size by
>                             not allocating the parameter area if
>                             it is not required. In the current
>                             implementation LowerFormalArguments_64SVR4
>                             already handles the parameter area, but
>                             LowerCall_64SVR4 does not
>                             (when calculating the stack frame size).
>                             What this patch does is make
>                             LowerCall_64SVR4 consistent with
>                             LowerFormalArguments_64SVR4.
>
>                             Committing on behalf of Hiroshi Inoue.
>
>                             Differential Revision:
>                             https://reviews.llvm.org/D29881
>
>                             Added:
>                             llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>                             Modified:
>                             llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>
>                             Modified:
>                             llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>                             URL:
>                             http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp?rev=296771&r1=296770&r2=296771&view=diff
>                             ==============================================================================
>                             ---
>                             llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>                             (original)
>                             +++
>                             llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>                             Thu Mar  2 11:38:59 2017
>                             @@ -5147,10 +5147,30 @@ SDValue
>                             PPCTargetLowering::LowerCall_64S
>                                };
>
>                                const unsigned NumGPRs =
>                             array_lengthof(GPR);
>                             -  const unsigned NumFPRs = 13;
>                             +  const unsigned NumFPRs = useSoftFloat()
>                             ? 0 : 13;
>                                const unsigned NumVRs  =
>                             array_lengthof(VR);
>                                const unsigned NumQFPRs = NumFPRs;
>
>                             +  // On ELFv2, we can avoid allocating
>                             the parameter area if all the arguments
>                             +  // can be passed to the callee in
>                             registers.
>                             +  // For the fast calling convention,
>                             there is another check below.
>                             +  // Note: keep consistent with
>                             LowerFormalArguments_64SVR4()
>                             +  bool HasParameterArea = !isELFv2ABI ||
>                             isVarArg || CallConv == CallingConv::Fast;
>                             +  if (!HasParameterArea) {
>                             +    unsigned ParamAreaSize = NumGPRs *
>                             PtrByteSize;
>                             +    unsigned AvailableFPRs = NumFPRs;
>                             +    unsigned AvailableVRs = NumVRs;
>                             +    unsigned NumBytesTmp = NumBytes;
>                             +    for (unsigned i = 0; i != NumOps; ++i) {
>                             +      if (Outs[i].Flags.isNest()) continue;
>                             +      if
>                             (CalculateStackSlotUsed(Outs[i].VT,
>                             Outs[i].ArgVT, Outs[i].Flags,
>                             + PtrByteSize, LinkageSize, ParamAreaSize,
>                             + NumBytesTmp, AvailableFPRs, AvailableVRs,
>                             + Subtarget.hasQPX()))
>                             + HasParameterArea = true;
>                             +    }
>                             +  }
>                             +
>                                // When using the fast calling
>                             convention, we don't provide backing for
>                                // arguments that will be in registers.
>                                unsigned NumGPRsUsed = 0, NumFPRsUsed =
>                             0, NumVRsUsed = 0;
>                             @@ -5218,13 +5238,18 @@ SDValue
>                             PPCTargetLowering::LowerCall_64S
>
>                                unsigned NumBytesActuallyUsed = NumBytes;
>
>                             -  // The prolog code of the callee may
>                             store up to 8 GPR argument registers to
>                             +  // In the old ELFv1 ABI,
>                             +  // the prolog code of the callee may
>                             store up to 8 GPR argument registers to
>                                // the stack, allowing va_start to
>                             index over them in memory if its varargs.
>                                // Because we cannot tell if this is
>                             needed on the caller side, we have to
>                                // conservatively assume that it is
>                             needed.  As such, make sure we have at
>                                // least enough stack space for the
>                             caller to store the 8 GPRs.
>                             -  // FIXME: On ELFv2, it may be
>                             unnecessary to allocate the parameter area.
>                             -  NumBytes = std::max(NumBytes,
>                             LinkageSize + 8 * PtrByteSize);
>                             +  // In the ELFv2 ABI, we allocate the
>                             parameter area iff a callee
>                             +  // really requires memory operands,
>                             e.g. a vararg function.
>                             +  if (HasParameterArea)
>                             +    NumBytes = std::max(NumBytes,
>                             LinkageSize + 8 * PtrByteSize);
>                             +  else
>                             +    NumBytes = LinkageSize;
>
>                                // Tail call needs the stack to be aligned.
>                                if
>                             (getTargetMachine().Options.GuaranteedTailCallOpt
>                             &&
>                             @@ -5443,6 +5468,8 @@ SDValue
>                             PPCTargetLowering::LowerCall_64S
>                                      if (CallConv == CallingConv::Fast)
>                              ComputePtrOff();
>
>                             + assert(HasParameterArea &&
>                             +  "Parameter area must exist to pass an
>                             argument in memory.");
>                              LowerMemOpCallTo(DAG, MF, Chain, Arg,
>                             PtrOff, SPDiff, ArgOffset,
>                                   true, isTailCall, false, MemOpChains,
>                             TailCallArguments, dl);
>                             @@ -5528,6 +5555,8 @@ SDValue
>                             PPCTargetLowering::LowerCall_64S
>                                        PtrOff = DAG.getNode(ISD::ADD,
>                             dl, PtrVT, PtrOff, ConstFour);
>                                      }
>
>                             + assert(HasParameterArea &&
>                             +  "Parameter area must exist to pass an
>                             argument in memory.");
>                              LowerMemOpCallTo(DAG, MF, Chain, Arg,
>                             PtrOff, SPDiff, ArgOffset,
>                                   true, isTailCall, false, MemOpChains,
>                             TailCallArguments, dl);
>                             @@ -5562,6 +5591,8 @@ SDValue
>                             PPCTargetLowering::LowerCall_64S
>                                    // GPRs when within range.  For
>                             now, we always put the value in both
>                                    // locations (or even all three).
>                                    if (isVarArg) {
>                             + assert(HasParameterArea &&
>                             +  "Parameter area must exist if we have a
>                             varargs call.");
>                                      // We could elide this store in
>                             the case where the object fits
>                                      // entirely in R registers. Maybe
>                             later.
>                                      SDValue Store =
>                             @@ -5594,6 +5625,8 @@ SDValue
>                             PPCTargetLowering::LowerCall_64S
>                                      if (CallConv == CallingConv::Fast)
>                              ComputePtrOff();
>
>                             + assert(HasParameterArea &&
>                             +  "Parameter area must exist to pass an
>                             argument in memory.");
>                              LowerMemOpCallTo(DAG, MF, Chain, Arg,
>                             PtrOff, SPDiff, ArgOffset,
>                                   true, isTailCall, true, MemOpChains,
>                             TailCallArguments, dl);
>                             @@ -5614,6 +5647,8 @@ SDValue
>                             PPCTargetLowering::LowerCall_64S
>                                  case MVT::v4i1: {
>                                    bool IsF32 =
>                             Arg.getValueType().getSimpleVT().SimpleTy
>                             == MVT::v4f32;
>                                    if (isVarArg) {
>                             + assert(HasParameterArea &&
>                             +  "Parameter area must exist if we have a
>                             varargs call.");
>                                      // We could elide this store in
>                             the case where the object fits
>                                      // entirely in R registers. Maybe
>                             later.
>                                      SDValue Store =
>                             @@ -5646,6 +5681,8 @@ SDValue
>                             PPCTargetLowering::LowerCall_64S
>                                      if (CallConv == CallingConv::Fast)
>                              ComputePtrOff();
>
>                             + assert(HasParameterArea &&
>                             +  "Parameter area must exist to pass an
>                             argument in memory.");
>                              LowerMemOpCallTo(DAG, MF, Chain, Arg,
>                             PtrOff, SPDiff, ArgOffset,
>                                   true, isTailCall, true, MemOpChains,
>                             TailCallArguments, dl);
>                             @@ -5660,7 +5697,8 @@ SDValue
>                             PPCTargetLowering::LowerCall_64S
>                                  }
>                                }
>
>                             - assert(NumBytesActuallyUsed == ArgOffset);
>                             + assert((!HasParameterArea ||
>                             NumBytesActuallyUsed == ArgOffset) &&
>                             +         "mismatch in size of parameter
>                             area");
>                              (void)NumBytesActuallyUsed;
>
>                                if (!MemOpChains.empty())
>
>                             Added:
>                             llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>                             URL:
>                             http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/stacksize.ll?rev=296771&view=auto
>                             ==============================================================================
>                             ---
>                             llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>                             (added)
>                             +++
>                             llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>                             Thu Mar  2 11:38:59 2017
>                             @@ -0,0 +1,86 @@
>                             +; For ELFv2 ABI, we can avoid allocating
>                             the parameter area in the stack frame of
>                             the caller function
>                             +; if all the arguments can be passed to
>                             the callee in registers.
>                             +; For ELFv1 ABI, we always need to
>                             allocate the parameter area.
>                             +
>                             +; Tests for ELFv2 ABI
>                             +; RUN: llc -verify-machineinstrs
>                             -mtriple=powerpc64le-unknown-linux-gnu
>                             -target-abi elfv2 < %s | FileCheck %s
>                             -check-prefix=PPC64-ELFV2
>                             +; RUN: llc -verify-machineinstrs
>                             -mtriple=powerpc64-unknown-linux-gnu
>                             -target-abi elfv2 < %s | FileCheck %s
>                             -check-prefix=PPC64-ELFV2
>                             +
>                             +; Tests for ELFv1 ABI
>                             +; RUN: llc -verify-machineinstrs
>                             -mtriple=powerpc64le-unknown-linux-gnu
>                             -target-abi elfv1 < %s | FileCheck %s
>                             -check-prefix=PPC64-ELFV1
>                             +; RUN: llc -verify-machineinstrs
>                             -mtriple=powerpc64-unknown-linux-gnu
>                             -target-abi elfv1 < %s | FileCheck %s
>                             -check-prefix=PPC64-ELFV1
>                             +
>                             +; If the callee has at most eight integer
>                             args, parameter area can be ommited for
>                             ELFv2 ABI.
>                             +
>                             +; PPC64-ELFV2-LABEL: WithoutParamArea1:
>                             +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV2: stdu 1, -32(1)
>                             +; PPC64-ELFV2: addi 1, 1, 32
>                             +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV1-LABEL: WithoutParamArea1:
>                             +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV1: stdu 1, -112(1)
>                             +; PPC64-ELFV1: addi 1, 1, 112
>                             +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +define signext i32 @WithoutParamArea1(i32
>                             signext %a) local_unnamed_addr #0 {
>                             +entry:
>                             +  %call = tail call signext i32
>                             @onearg(i32 signext %a) #2
>                             +  ret i32 %call
>                             +}
>                             +
>                             +; PPC64-ELFV2-LABEL: WithoutParamArea2:
>                             +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV2: stdu 1, -32(1)
>                             +; PPC64-ELFV2: addi 1, 1, 32
>                             +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV1-LABEL: WithoutParamArea2:
>                             +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV1: stdu 1, -112(1)
>                             +; PPC64-ELFV1: addi 1, 1, 112
>                             +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +define signext i32 @WithoutParamArea2(i32
>                             signext %a) local_unnamed_addr #0 {
>                             +entry:
>                             +  %call = tail call signext i32
>                             @eightargs(i32 signext %a, i32 signext %a,
>                             i32 signext %a, i32 signext %a, i32
>                             signext %a, i32 signext %a, i32 signext
>                             %a, i32 signext %a) #2
>                             +  ret i32 %call
>                             +}
>                             +
>                             +; If the callee has more than eight
>                             integer args or variable number of args,
>                             +; parameter area cannot be ommited even
>                             for ELFv2 ABI
>                             +
>                             +; PPC64-ELFV2-LABEL: WithParamArea1:
>                             +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV2: stdu 1, -96(1)
>                             +; PPC64-ELFV2: addi 1, 1, 96
>                             +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV1-LABEL: WithParamArea1:
>                             +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV1: stdu 1, -112(1)
>                             +; PPC64-ELFV1: addi 1, 1, 112
>                             +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +define signext i32 @WithParamArea1(i32
>                             signext %a) local_unnamed_addr #0 {
>                             +entry:
>                             +  %call = tail call signext i32 (i32,
>                             ...) @varargs(i32 signext %a, i32 signext
>                             %a) #2
>                             +  ret i32 %call
>                             +}
>                             +
>                             +; PPC64-ELFV2-LABEL: WithParamArea2:
>                             +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV2: stdu 1, -112(1)
>                             +; PPC64-ELFV2: addi 1, 1, 112
>                             +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV1-LABEL: WithParamArea2:
>                             +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +; PPC64-ELFV1: stdu 1, -128(1)
>                             +; PPC64-ELFV1: addi 1, 1, 128
>                             +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>                             -{{[0-9]+}}(1)
>                             +define signext i32 @WithParamArea2(i32
>                             signext %a) local_unnamed_addr #0 {
>                             +entry:
>                             +  %call = tail call signext i32
>                             @nineargs(i32 signext %a, i32 signext %a,
>                             i32 signext %a, i32 signext %a, i32
>                             signext %a, i32 signext %a, i32 signext
>                             %a, i32 signext %a, i32 signext %a) #2
>                             +  ret i32 %call
>                             +}
>                             +
>                             +declare signext i32 @onearg(i32 signext)
>                             local_unnamed_addr #1
>                             +declare signext i32 @eightargs(i32
>                             signext, i32 signext, i32 signext, i32
>                             signext, i32 signext, i32 signext, i32
>                             signext, i32 signext) local_unnamed_addr #1
>                             +declare signext i32 @nineargs(i32
>                             signext, i32 signext, i32 signext, i32
>                             signext, i32 signext, i32 signext, i32
>                             signext, i32 signext, i32 signext)
>                             local_unnamed_addr #1
>                             +declare signext i32 @varargs(i32 signext,
>                             ...) local_unnamed_addr #1
>                             +
>
>
>                             _______________________________________________
>                             llvm-commits mailing list
>                             llvm-commits at lists.llvm.org
>                             <mailto:llvm-commits at lists.llvm.org>
>                             http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
>
>
>                 _______________________________________________
>                 llvm-commits mailing list
>                 llvm-commits at lists.llvm.org
>                 <mailto:llvm-commits at lists.llvm.org>
>                 http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits

-- 
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170308/4c33e7eb/attachment.html>


More information about the llvm-commits mailing list