[llvm] r296771 - [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size
Hal Finkel via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 8 17:14:12 PST 2017
On 03/08/2017 06:59 PM, Hal Finkel via llvm-commits wrote:
>
>
> On 03/08/2017 01:29 PM, Tim Shen wrote:
>> On Wed, Mar 8, 2017, 03:18 Hal Finkel <hfinkel at anl.gov
>> <mailto:hfinkel at anl.gov>> wrote:
>>
>>
>> On 03/07/2017 08:43 PM, Tim Shen via llvm-commits wrote:
>>> On Tue, Mar 7, 2017 at 6:21 PM Tim Shen <timshen at google.com
>>> <mailto:timshen at google.com>> wrote:
>>>
>>> I found the problem. In the patch, there is a check
>>> for isVarArg:
>>> bool HasParameterArea = !isELFv2ABI || isVarArg ||
>>> CallConv == CallingConv::Fast;
>>>
>>> In our internal case, the callee is indeed a var arg
>>> function, but the call is through a function pointer. I
>>> think this optimization should be off for indirect call as well.
>>>
>>>
>>> Oops, we had a C cast from a var arg function pointer into a
>>> non-var-arg function pointer in the source. We deserve the UB :/.
>>
>> Do you think that we should make Clang warn on this case? A user
>> might not realize that on some targets this is an
>> ABI-incompatible change.
>>
>>
>> Yes, we might be able to do that.
>>
>> A normal usage of casting a function pointer is type erasure. In that
>> case, we can suggest the user to cast to void* if possible (*). For
>> function-pointer to function-pointer cast, issue a warning.
>
> I was specifically thinking about warning on a cast between a var-args
> function and a non-var-args function. Would that have produced a
> warning in the code that you have? We could do a similar thing for
> other kinds of basic parameter incompatibilities (i.e. a function that
> takes an int to one that takes a float). Some of this we might already do.
I filed https://bugs.llvm.org/show_bug.cgi?id=32188 to track this.
-Hal
>
> -Hal
>
>>
>> CC Richard.
>>
>> * Casting a function pointer from and to data pointer is
>> implementation defined:
>> http://en.cppreference.com/w/cpp/language/reinterpret_cast
>>
>>
>> -Hal
>>
>>
>>>
>>> For this failure it's not the patch's fault. I'll revert the
>>> revert and test it again.
>>>
>>> Sorry for the disturbance!
>>>
>>>
>>> On Tue, Mar 7, 2017 at 2:37 PM Tim Shen <timshen at google.com
>>> <mailto:timshen at google.com>> wrote:
>>>
>>> Update:
>>>
>>> The failure mode is that, malloc(10) returns a bogus
>>> address, like 0x3fff9cc60428. Normal addresses are
>>> like 0x1003a00b980. Here malloc uses TCMalloc.
>>>
>>> Still looking.
>>>
>>> On Tue, Mar 7, 2017 at 9:40 AM Tim Shen
>>> <timshen at google.com <mailto:timshen at google.com>> wrote:
>>>
>>> In an internal function, I found the a very early
>>> instruction modifies the first parameter (r3):
>>> ld r3,-31576(r2)
>>>
>>> which immediately clobbers r3. Succeeding operations
>>> on r3 causes a crash.
>>>
>>> My guess would be this function assumes the
>>> parameters are on the stack, while the caller
>>> doesn't assume so.
>>>
>>> Still investigating.
>>>
>>> On Tue, Mar 7, 2017 at 1:59 AM Nemanja Ivanovic via
>>> llvm-commits <llvm-commits at lists.llvm.org
>>> <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>
>>> Thanks Daniel and Tim,
>>>
>>> It's certainly a curious thing that this occurs.
>>> I am just curious, does this happen at noopt or
>>> with optimization? The only thing I can think of
>>> is that maybe this stack frame reduction isn't
>>> valid if mem2reg isn't run. But this is more or
>>> less a total shot in the dark.
>>>
>>> Nemanja.
>>>
>>> On Tue, Mar 7, 2017 at 9:20 AM, Daniel Jasper
>>> <djasper at google.com <mailto:djasper at google.com>>
>>> wrote:
>>>
>>> Just FYI, Tim has reverted this patch and is
>>> looking at reducing a test case. He should
>>> have one tomorrow morning.
>>>
>>> So far, he found that the first parameter is
>>> live but clobbered immediately in some case.
>>> Assembly:
>>> ld r3,-31576(r2)
>>>
>>> (I don't know what that means, but just
>>> wanted to provide as much information as I
>>> can, in case that makes it easy for you to
>>> spot the bug)
>>>
>>> Cheers,
>>> Danie
>>>
>>> On Tue, Mar 7, 2017 at 7:48 AM, Daniel
>>> Jasper <djasper at google.com
>>> <mailto:djasper at google.com>> wrote:
>>>
>>> This seems to be causing some
>>> miscompiles. Working on a test case.
>>>
>>> On Thu, Mar 2, 2017 at 6:39 PM, Nemanja
>>> Ivanovic via llvm-commits
>>> <llvm-commits at lists.llvm.org
>>> <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>
>>> Author: nemanjai
>>> Date: Thu Mar 2 11:38:59 2017
>>> New Revision: 296771
>>>
>>> URL:
>>> http://llvm.org/viewvc/llvm-project?rev=296771&view=rev
>>> Log:
>>> [PowerPC][ELFv2ABI] Allocate
>>> parameter area on-demand to reduce
>>> stack frame size
>>>
>>> This patch reduces the stack frame
>>> size by not allocating the parameter
>>> area if
>>> it is not required. In the current
>>> implementation
>>> LowerFormalArguments_64SVR4
>>> already handles the parameter area,
>>> but LowerCall_64SVR4 does not
>>> (when calculating the stack frame
>>> size). What this patch does is make
>>> LowerCall_64SVR4 consistent with
>>> LowerFormalArguments_64SVR4.
>>>
>>> Committing on behalf of Hiroshi Inoue.
>>>
>>> Differential Revision:
>>> https://reviews.llvm.org/D29881
>>>
>>> Added:
>>> llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>>> Modified:
>>> llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>>>
>>> Modified:
>>> llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp?rev=296771&r1=296770&r2=296771&view=diff
>>> ==============================================================================
>>> ---
>>> llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>>> (original)
>>> +++
>>> llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>>> Thu Mar 2 11:38:59 2017
>>> @@ -5147,10 +5147,30 @@ SDValue
>>> PPCTargetLowering::LowerCall_64S
>>> };
>>>
>>> const unsigned NumGPRs =
>>> array_lengthof(GPR);
>>> - const unsigned NumFPRs = 13;
>>> + const unsigned NumFPRs =
>>> useSoftFloat() ? 0 : 13;
>>> const unsigned NumVRs =
>>> array_lengthof(VR);
>>> const unsigned NumQFPRs = NumFPRs;
>>>
>>> + // On ELFv2, we can avoid
>>> allocating the parameter area if all
>>> the arguments
>>> + // can be passed to the callee in
>>> registers.
>>> + // For the fast calling
>>> convention, there is another check
>>> below.
>>> + // Note: keep consistent with
>>> LowerFormalArguments_64SVR4()
>>> + bool HasParameterArea =
>>> !isELFv2ABI || isVarArg || CallConv
>>> == CallingConv::Fast;
>>> + if (!HasParameterArea) {
>>> + unsigned ParamAreaSize =
>>> NumGPRs * PtrByteSize;
>>> + unsigned AvailableFPRs = NumFPRs;
>>> + unsigned AvailableVRs = NumVRs;
>>> + unsigned NumBytesTmp = NumBytes;
>>> + for (unsigned i = 0; i !=
>>> NumOps; ++i) {
>>> + if (Outs[i].Flags.isNest())
>>> continue;
>>> + if
>>> (CalculateStackSlotUsed(Outs[i].VT,
>>> Outs[i].ArgVT, Outs[i].Flags,
>>> + PtrByteSize, LinkageSize,
>>> ParamAreaSize,
>>> + NumBytesTmp, AvailableFPRs,
>>> AvailableVRs,
>>> + Subtarget.hasQPX()))
>>> + HasParameterArea = true;
>>> + }
>>> + }
>>> +
>>> // When using the fast calling
>>> convention, we don't provide backing for
>>> // arguments that will be in
>>> registers.
>>> unsigned NumGPRsUsed = 0,
>>> NumFPRsUsed = 0, NumVRsUsed = 0;
>>> @@ -5218,13 +5238,18 @@ SDValue
>>> PPCTargetLowering::LowerCall_64S
>>>
>>> unsigned NumBytesActuallyUsed =
>>> NumBytes;
>>>
>>> - // The prolog code of the callee
>>> may store up to 8 GPR argument
>>> registers to
>>> + // In the old ELFv1 ABI,
>>> + // the prolog code of the callee
>>> may store up to 8 GPR argument
>>> registers to
>>> // the stack, allowing va_start
>>> to index over them in memory if its
>>> varargs.
>>> // Because we cannot tell if this
>>> is needed on the caller side, we have to
>>> // conservatively assume that it
>>> is needed. As such, make sure we
>>> have at
>>> // least enough stack space for
>>> the caller to store the 8 GPRs.
>>> - // FIXME: On ELFv2, it may be
>>> unnecessary to allocate the
>>> parameter area.
>>> - NumBytes = std::max(NumBytes,
>>> LinkageSize + 8 * PtrByteSize);
>>> + // In the ELFv2 ABI, we allocate
>>> the parameter area iff a callee
>>> + // really requires memory
>>> operands, e.g. a vararg function.
>>> + if (HasParameterArea)
>>> + NumBytes = std::max(NumBytes,
>>> LinkageSize + 8 * PtrByteSize);
>>> + else
>>> + NumBytes = LinkageSize;
>>>
>>> // Tail call needs the stack to
>>> be aligned.
>>> if
>>> (getTargetMachine().Options.GuaranteedTailCallOpt
>>> &&
>>> @@ -5443,6 +5468,8 @@ SDValue
>>> PPCTargetLowering::LowerCall_64S
>>> if (CallConv ==
>>> CallingConv::Fast)
>>> ComputePtrOff();
>>>
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist to
>>> pass an argument in memory.");
>>> LowerMemOpCallTo(DAG, MF, Chain,
>>> Arg, PtrOff, SPDiff, ArgOffset,
>>> true, isTailCall, false, MemOpChains,
>>> TailCallArguments, dl);
>>> @@ -5528,6 +5555,8 @@ SDValue
>>> PPCTargetLowering::LowerCall_64S
>>> PtrOff = DAG.getNode(ISD::ADD, dl,
>>> PtrVT, PtrOff, ConstFour);
>>> }
>>>
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist to
>>> pass an argument in memory.");
>>> LowerMemOpCallTo(DAG, MF, Chain,
>>> Arg, PtrOff, SPDiff, ArgOffset,
>>> true, isTailCall, false, MemOpChains,
>>> TailCallArguments, dl);
>>> @@ -5562,6 +5591,8 @@ SDValue
>>> PPCTargetLowering::LowerCall_64S
>>> // GPRs when within range.
>>> For now, we always put the value in both
>>> // locations (or even all three).
>>> if (isVarArg) {
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist if we
>>> have a varargs call.");
>>> // We could elide this
>>> store in the case where the object fits
>>> // entirely in R registers.
>>> Maybe later.
>>> SDValue Store =
>>> @@ -5594,6 +5625,8 @@ SDValue
>>> PPCTargetLowering::LowerCall_64S
>>> if (CallConv ==
>>> CallingConv::Fast)
>>> ComputePtrOff();
>>>
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist to
>>> pass an argument in memory.");
>>> LowerMemOpCallTo(DAG, MF, Chain,
>>> Arg, PtrOff, SPDiff, ArgOffset,
>>> true, isTailCall, true, MemOpChains,
>>> TailCallArguments, dl);
>>> @@ -5614,6 +5647,8 @@ SDValue
>>> PPCTargetLowering::LowerCall_64S
>>> case MVT::v4i1: {
>>> bool IsF32 =
>>> Arg.getValueType().getSimpleVT().SimpleTy
>>> == MVT::v4f32;
>>> if (isVarArg) {
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist if we
>>> have a varargs call.");
>>> // We could elide this
>>> store in the case where the object fits
>>> // entirely in R registers.
>>> Maybe later.
>>> SDValue Store =
>>> @@ -5646,6 +5681,8 @@ SDValue
>>> PPCTargetLowering::LowerCall_64S
>>> if (CallConv ==
>>> CallingConv::Fast)
>>> ComputePtrOff();
>>>
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist to
>>> pass an argument in memory.");
>>> LowerMemOpCallTo(DAG, MF, Chain,
>>> Arg, PtrOff, SPDiff, ArgOffset,
>>> true, isTailCall, true, MemOpChains,
>>> TailCallArguments, dl);
>>> @@ -5660,7 +5697,8 @@ SDValue
>>> PPCTargetLowering::LowerCall_64S
>>> }
>>> }
>>>
>>> - assert(NumBytesActuallyUsed ==
>>> ArgOffset);
>>> + assert((!HasParameterArea ||
>>> NumBytesActuallyUsed == ArgOffset) &&
>>> + "mismatch in size of parameter
>>> area");
>>> (void)NumBytesActuallyUsed;
>>>
>>> if (!MemOpChains.empty())
>>>
>>> Added:
>>> llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>>> URL:
>>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/stacksize.ll?rev=296771&view=auto
>>> ==============================================================================
>>> ---
>>> llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>>> (added)
>>> +++
>>> llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>>> Thu Mar 2 11:38:59 2017
>>> @@ -0,0 +1,86 @@
>>> +; For ELFv2 ABI, we can avoid
>>> allocating the parameter area in the
>>> stack frame of the caller function
>>> +; if all the arguments can be
>>> passed to the callee in registers.
>>> +; For ELFv1 ABI, we always need to
>>> allocate the parameter area.
>>> +
>>> +; Tests for ELFv2 ABI
>>> +; RUN: llc -verify-machineinstrs
>>> -mtriple=powerpc64le-unknown-linux-gnu
>>> -target-abi elfv2 < %s | FileCheck
>>> %s -check-prefix=PPC64-ELFV2
>>> +; RUN: llc -verify-machineinstrs
>>> -mtriple=powerpc64-unknown-linux-gnu
>>> -target-abi elfv2 < %s | FileCheck
>>> %s -check-prefix=PPC64-ELFV2
>>> +
>>> +; Tests for ELFv1 ABI
>>> +; RUN: llc -verify-machineinstrs
>>> -mtriple=powerpc64le-unknown-linux-gnu
>>> -target-abi elfv1 < %s | FileCheck
>>> %s -check-prefix=PPC64-ELFV1
>>> +; RUN: llc -verify-machineinstrs
>>> -mtriple=powerpc64-unknown-linux-gnu
>>> -target-abi elfv1 < %s | FileCheck
>>> %s -check-prefix=PPC64-ELFV1
>>> +
>>> +; If the callee has at most eight
>>> integer args, parameter area can be
>>> ommited for ELFv2 ABI.
>>> +
>>> +; PPC64-ELFV2-LABEL: WithoutParamArea1:
>>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV2: stdu 1, -32(1)
>>> +; PPC64-ELFV2: addi 1, 1, 32
>>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1-LABEL: WithoutParamArea1:
>>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1: stdu 1, -112(1)
>>> +; PPC64-ELFV1: addi 1, 1, 112
>>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +define signext i32
>>> @WithoutParamArea1(i32 signext %a)
>>> local_unnamed_addr #0 {
>>> +entry:
>>> + %call = tail call signext i32
>>> @onearg(i32 signext %a) #2
>>> + ret i32 %call
>>> +}
>>> +
>>> +; PPC64-ELFV2-LABEL: WithoutParamArea2:
>>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV2: stdu 1, -32(1)
>>> +; PPC64-ELFV2: addi 1, 1, 32
>>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1-LABEL: WithoutParamArea2:
>>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1: stdu 1, -112(1)
>>> +; PPC64-ELFV1: addi 1, 1, 112
>>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +define signext i32
>>> @WithoutParamArea2(i32 signext %a)
>>> local_unnamed_addr #0 {
>>> +entry:
>>> + %call = tail call signext i32
>>> @eightargs(i32 signext %a, i32
>>> signext %a, i32 signext %a, i32
>>> signext %a, i32 signext %a, i32
>>> signext %a, i32 signext %a, i32
>>> signext %a) #2
>>> + ret i32 %call
>>> +}
>>> +
>>> +; If the callee has more than eight
>>> integer args or variable number of args,
>>> +; parameter area cannot be ommited
>>> even for ELFv2 ABI
>>> +
>>> +; PPC64-ELFV2-LABEL: WithParamArea1:
>>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV2: stdu 1, -96(1)
>>> +; PPC64-ELFV2: addi 1, 1, 96
>>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1-LABEL: WithParamArea1:
>>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1: stdu 1, -112(1)
>>> +; PPC64-ELFV1: addi 1, 1, 112
>>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +define signext i32
>>> @WithParamArea1(i32 signext %a)
>>> local_unnamed_addr #0 {
>>> +entry:
>>> + %call = tail call signext i32
>>> (i32, ...) @varargs(i32 signext %a,
>>> i32 signext %a) #2
>>> + ret i32 %call
>>> +}
>>> +
>>> +; PPC64-ELFV2-LABEL: WithParamArea2:
>>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV2: stdu 1, -112(1)
>>> +; PPC64-ELFV2: addi 1, 1, 112
>>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1-LABEL: WithParamArea2:
>>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1: stdu 1, -128(1)
>>> +; PPC64-ELFV1: addi 1, 1, 128
>>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>>> -{{[0-9]+}}(1)
>>> +define signext i32
>>> @WithParamArea2(i32 signext %a)
>>> local_unnamed_addr #0 {
>>> +entry:
>>> + %call = tail call signext i32
>>> @nineargs(i32 signext %a, i32
>>> signext %a, i32 signext %a, i32
>>> signext %a, i32 signext %a, i32
>>> signext %a, i32 signext %a, i32
>>> signext %a, i32 signext %a) #2
>>> + ret i32 %call
>>> +}
>>> +
>>> +declare signext i32 @onearg(i32
>>> signext) local_unnamed_addr #1
>>> +declare signext i32 @eightargs(i32
>>> signext, i32 signext, i32 signext,
>>> i32 signext, i32 signext, i32
>>> signext, i32 signext, i32 signext)
>>> local_unnamed_addr #1
>>> +declare signext i32 @nineargs(i32
>>> signext, i32 signext, i32 signext,
>>> i32 signext, i32 signext, i32
>>> signext, i32 signext, i32 signext,
>>> i32 signext) local_unnamed_addr #1
>>> +declare signext i32 @varargs(i32
>>> signext, ...) local_unnamed_addr #1
>>> +
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> <mailto:llvm-commits at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> <mailto:llvm-commits at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>> --
>> Hal Finkel
>> Lead, Compiler Technology and Programming Languages
>> Leadership Computing Facility
>> Argonne National Laboratory
>>
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170308/f2f1e08e/attachment-0001.html>
More information about the llvm-commits
mailing list