[llvm] r296771 - [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size
Hal Finkel via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 8 16:59:57 PST 2017
On 03/08/2017 01:29 PM, Tim Shen wrote:
> On Wed, Mar 8, 2017, 03:18 Hal Finkel <hfinkel at anl.gov
> <mailto:hfinkel at anl.gov>> wrote:
>
>
> On 03/07/2017 08:43 PM, Tim Shen via llvm-commits wrote:
>> On Tue, Mar 7, 2017 at 6:21 PM Tim Shen <timshen at google.com
>> <mailto:timshen at google.com>> wrote:
>>
>> I found the problem. In the patch, there is a check for isVarArg:
>> bool HasParameterArea = !isELFv2ABI || isVarArg || CallConv
>> == CallingConv::Fast;
>>
>> In our internal case, the callee is indeed a var arg
>> function, but the call is through a function pointer. I think
>> this optimization should be off for indirect call as well.
>>
>>
>> Oops, we had a C cast from a var arg function pointer into a
>> non-var-arg function pointer in the source. We deserve the UB :/.
>
> Do you think that we should make Clang warn on this case? A user
> might not realize that on some targets this is an ABI-incompatible
> change.
>
>
> Yes, we might be able to do that.
>
> A normal usage of casting a function pointer is type erasure. In that
> case, we can suggest the user to cast to void* if possible (*). For
> function-pointer to function-pointer cast, issue a warning.
I was specifically thinking about warning on a cast between a var-args
function and a non-var-args function. Would that have produced a warning
in the code that you have? We could do a similar thing for other kinds
of basic parameter incompatibilities (i.e. a function that takes an int
to one that takes a float). Some of this we might already do.
-Hal
>
> CC Richard.
>
> * Casting a function pointer from and to data pointer is
> implementation defined:
> http://en.cppreference.com/w/cpp/language/reinterpret_cast
>
>
> -Hal
>
>
>>
>> For this failure it's not the patch's fault. I'll revert the
>> revert and test it again.
>>
>> Sorry for the disturbance!
>>
>>
>> On Tue, Mar 7, 2017 at 2:37 PM Tim Shen <timshen at google.com
>> <mailto:timshen at google.com>> wrote:
>>
>> Update:
>>
>> The failure mode is that, malloc(10) returns a bogus
>> address, like 0x3fff9cc60428. Normal addresses are
>> like 0x1003a00b980. Here malloc uses TCMalloc.
>>
>> Still looking.
>>
>> On Tue, Mar 7, 2017 at 9:40 AM Tim Shen
>> <timshen at google.com <mailto:timshen at google.com>> wrote:
>>
>> In an internal function, I found the a very early
>> instruction modifies the first parameter (r3):
>> ld r3,-31576(r2)
>>
>> which immediately clobbers r3. Succeeding operations
>> on r3 causes a crash.
>>
>> My guess would be this function assumes the
>> parameters are on the stack, while the caller doesn't
>> assume so.
>>
>> Still investigating.
>>
>> On Tue, Mar 7, 2017 at 1:59 AM Nemanja Ivanovic via
>> llvm-commits <llvm-commits at lists.llvm.org
>> <mailto:llvm-commits at lists.llvm.org>> wrote:
>>
>> Thanks Daniel and Tim,
>>
>> It's certainly a curious thing that this occurs.
>> I am just curious, does this happen at noopt or
>> with optimization? The only thing I can think of
>> is that maybe this stack frame reduction isn't
>> valid if mem2reg isn't run. But this is more or
>> less a total shot in the dark.
>>
>> Nemanja.
>>
>> On Tue, Mar 7, 2017 at 9:20 AM, Daniel Jasper
>> <djasper at google.com <mailto:djasper at google.com>>
>> wrote:
>>
>> Just FYI, Tim has reverted this patch and is
>> looking at reducing a test case. He should
>> have one tomorrow morning.
>>
>> So far, he found that the first parameter is
>> live but clobbered immediately in some case.
>> Assembly:
>> ld r3,-31576(r2)
>>
>> (I don't know what that means, but just
>> wanted to provide as much information as I
>> can, in case that makes it easy for you to
>> spot the bug)
>>
>> Cheers,
>> Danie
>>
>> On Tue, Mar 7, 2017 at 7:48 AM, Daniel Jasper
>> <djasper at google.com
>> <mailto:djasper at google.com>> wrote:
>>
>> This seems to be causing some
>> miscompiles. Working on a test case.
>>
>> On Thu, Mar 2, 2017 at 6:39 PM, Nemanja
>> Ivanovic via llvm-commits
>> <llvm-commits at lists.llvm.org
>> <mailto:llvm-commits at lists.llvm.org>> wrote:
>>
>> Author: nemanjai
>> Date: Thu Mar 2 11:38:59 2017
>> New Revision: 296771
>>
>> URL:
>> http://llvm.org/viewvc/llvm-project?rev=296771&view=rev
>> Log:
>> [PowerPC][ELFv2ABI] Allocate
>> parameter area on-demand to reduce
>> stack frame size
>>
>> This patch reduces the stack frame
>> size by not allocating the parameter
>> area if
>> it is not required. In the current
>> implementation
>> LowerFormalArguments_64SVR4
>> already handles the parameter area,
>> but LowerCall_64SVR4 does not
>> (when calculating the stack frame
>> size). What this patch does is make
>> LowerCall_64SVR4 consistent with
>> LowerFormalArguments_64SVR4.
>>
>> Committing on behalf of Hiroshi Inoue.
>>
>> Differential Revision:
>> https://reviews.llvm.org/D29881
>>
>> Added:
>> llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>> Modified:
>> llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>>
>> Modified:
>> llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp?rev=296771&r1=296770&r2=296771&view=diff
>> ==============================================================================
>> ---
>> llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>> (original)
>> +++
>> llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>> Thu Mar 2 11:38:59 2017
>> @@ -5147,10 +5147,30 @@ SDValue
>> PPCTargetLowering::LowerCall_64S
>> };
>>
>> const unsigned NumGPRs =
>> array_lengthof(GPR);
>> - const unsigned NumFPRs = 13;
>> + const unsigned NumFPRs =
>> useSoftFloat() ? 0 : 13;
>> const unsigned NumVRs =
>> array_lengthof(VR);
>> const unsigned NumQFPRs = NumFPRs;
>>
>> + // On ELFv2, we can avoid
>> allocating the parameter area if all
>> the arguments
>> + // can be passed to the callee in
>> registers.
>> + // For the fast calling
>> convention, there is another check below.
>> + // Note: keep consistent with
>> LowerFormalArguments_64SVR4()
>> + bool HasParameterArea =
>> !isELFv2ABI || isVarArg || CallConv
>> == CallingConv::Fast;
>> + if (!HasParameterArea) {
>> + unsigned ParamAreaSize = NumGPRs
>> * PtrByteSize;
>> + unsigned AvailableFPRs = NumFPRs;
>> + unsigned AvailableVRs = NumVRs;
>> + unsigned NumBytesTmp = NumBytes;
>> + for (unsigned i = 0; i !=
>> NumOps; ++i) {
>> + if (Outs[i].Flags.isNest())
>> continue;
>> + if
>> (CalculateStackSlotUsed(Outs[i].VT,
>> Outs[i].ArgVT, Outs[i].Flags,
>> + PtrByteSize, LinkageSize,
>> ParamAreaSize,
>> + NumBytesTmp, AvailableFPRs,
>> AvailableVRs,
>> + Subtarget.hasQPX()))
>> + HasParameterArea = true;
>> + }
>> + }
>> +
>> // When using the fast calling
>> convention, we don't provide backing for
>> // arguments that will be in
>> registers.
>> unsigned NumGPRsUsed = 0,
>> NumFPRsUsed = 0, NumVRsUsed = 0;
>> @@ -5218,13 +5238,18 @@ SDValue
>> PPCTargetLowering::LowerCall_64S
>>
>> unsigned NumBytesActuallyUsed =
>> NumBytes;
>>
>> - // The prolog code of the callee
>> may store up to 8 GPR argument
>> registers to
>> + // In the old ELFv1 ABI,
>> + // the prolog code of the callee
>> may store up to 8 GPR argument
>> registers to
>> // the stack, allowing va_start to
>> index over them in memory if its varargs.
>> // Because we cannot tell if this
>> is needed on the caller side, we have to
>> // conservatively assume that it
>> is needed. As such, make sure we have at
>> // least enough stack space for
>> the caller to store the 8 GPRs.
>> - // FIXME: On ELFv2, it may be
>> unnecessary to allocate the parameter
>> area.
>> - NumBytes = std::max(NumBytes,
>> LinkageSize + 8 * PtrByteSize);
>> + // In the ELFv2 ABI, we allocate
>> the parameter area iff a callee
>> + // really requires memory
>> operands, e.g. a vararg function.
>> + if (HasParameterArea)
>> + NumBytes = std::max(NumBytes,
>> LinkageSize + 8 * PtrByteSize);
>> + else
>> + NumBytes = LinkageSize;
>>
>> // Tail call needs the stack to be
>> aligned.
>> if
>> (getTargetMachine().Options.GuaranteedTailCallOpt
>> &&
>> @@ -5443,6 +5468,8 @@ SDValue
>> PPCTargetLowering::LowerCall_64S
>> if (CallConv ==
>> CallingConv::Fast)
>> ComputePtrOff();
>>
>> + assert(HasParameterArea &&
>> + "Parameter area must exist to pass
>> an argument in memory.");
>> LowerMemOpCallTo(DAG, MF, Chain,
>> Arg, PtrOff, SPDiff, ArgOffset,
>> true, isTailCall, false, MemOpChains,
>> TailCallArguments, dl);
>> @@ -5528,6 +5555,8 @@ SDValue
>> PPCTargetLowering::LowerCall_64S
>> PtrOff = DAG.getNode(ISD::ADD, dl,
>> PtrVT, PtrOff, ConstFour);
>> }
>>
>> + assert(HasParameterArea &&
>> + "Parameter area must exist to pass
>> an argument in memory.");
>> LowerMemOpCallTo(DAG, MF, Chain,
>> Arg, PtrOff, SPDiff, ArgOffset,
>> true, isTailCall, false, MemOpChains,
>> TailCallArguments, dl);
>> @@ -5562,6 +5591,8 @@ SDValue
>> PPCTargetLowering::LowerCall_64S
>> // GPRs when within range.
>> For now, we always put the value in both
>> // locations (or even all three).
>> if (isVarArg) {
>> + assert(HasParameterArea &&
>> + "Parameter area must exist if we
>> have a varargs call.");
>> // We could elide this store
>> in the case where the object fits
>> // entirely in R registers.
>> Maybe later.
>> SDValue Store =
>> @@ -5594,6 +5625,8 @@ SDValue
>> PPCTargetLowering::LowerCall_64S
>> if (CallConv ==
>> CallingConv::Fast)
>> ComputePtrOff();
>>
>> + assert(HasParameterArea &&
>> + "Parameter area must exist to pass
>> an argument in memory.");
>> LowerMemOpCallTo(DAG, MF, Chain,
>> Arg, PtrOff, SPDiff, ArgOffset,
>> true, isTailCall, true, MemOpChains,
>> TailCallArguments, dl);
>> @@ -5614,6 +5647,8 @@ SDValue
>> PPCTargetLowering::LowerCall_64S
>> case MVT::v4i1: {
>> bool IsF32 =
>> Arg.getValueType().getSimpleVT().SimpleTy
>> == MVT::v4f32;
>> if (isVarArg) {
>> + assert(HasParameterArea &&
>> + "Parameter area must exist if we
>> have a varargs call.");
>> // We could elide this store
>> in the case where the object fits
>> // entirely in R registers.
>> Maybe later.
>> SDValue Store =
>> @@ -5646,6 +5681,8 @@ SDValue
>> PPCTargetLowering::LowerCall_64S
>> if (CallConv ==
>> CallingConv::Fast)
>> ComputePtrOff();
>>
>> + assert(HasParameterArea &&
>> + "Parameter area must exist to pass
>> an argument in memory.");
>> LowerMemOpCallTo(DAG, MF, Chain,
>> Arg, PtrOff, SPDiff, ArgOffset,
>> true, isTailCall, true, MemOpChains,
>> TailCallArguments, dl);
>> @@ -5660,7 +5697,8 @@ SDValue
>> PPCTargetLowering::LowerCall_64S
>> }
>> }
>>
>> - assert(NumBytesActuallyUsed ==
>> ArgOffset);
>> + assert((!HasParameterArea ||
>> NumBytesActuallyUsed == ArgOffset) &&
>> + "mismatch in size of parameter area");
>> (void)NumBytesActuallyUsed;
>>
>> if (!MemOpChains.empty())
>>
>> Added:
>> llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/PowerPC/stacksize.ll?rev=296771&view=auto
>> ==============================================================================
>> ---
>> llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>> (added)
>> +++
>> llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>> Thu Mar 2 11:38:59 2017
>> @@ -0,0 +1,86 @@
>> +; For ELFv2 ABI, we can avoid
>> allocating the parameter area in the
>> stack frame of the caller function
>> +; if all the arguments can be passed
>> to the callee in registers.
>> +; For ELFv1 ABI, we always need to
>> allocate the parameter area.
>> +
>> +; Tests for ELFv2 ABI
>> +; RUN: llc -verify-machineinstrs
>> -mtriple=powerpc64le-unknown-linux-gnu
>> -target-abi elfv2 < %s | FileCheck %s
>> -check-prefix=PPC64-ELFV2
>> +; RUN: llc -verify-machineinstrs
>> -mtriple=powerpc64-unknown-linux-gnu
>> -target-abi elfv2 < %s | FileCheck %s
>> -check-prefix=PPC64-ELFV2
>> +
>> +; Tests for ELFv1 ABI
>> +; RUN: llc -verify-machineinstrs
>> -mtriple=powerpc64le-unknown-linux-gnu
>> -target-abi elfv1 < %s | FileCheck %s
>> -check-prefix=PPC64-ELFV1
>> +; RUN: llc -verify-machineinstrs
>> -mtriple=powerpc64-unknown-linux-gnu
>> -target-abi elfv1 < %s | FileCheck %s
>> -check-prefix=PPC64-ELFV1
>> +
>> +; If the callee has at most eight
>> integer args, parameter area can be
>> ommited for ELFv2 ABI.
>> +
>> +; PPC64-ELFV2-LABEL: WithoutParamArea1:
>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV2: stdu 1, -32(1)
>> +; PPC64-ELFV2: addi 1, 1, 32
>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV1-LABEL: WithoutParamArea1:
>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV1: stdu 1, -112(1)
>> +; PPC64-ELFV1: addi 1, 1, 112
>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +define signext i32
>> @WithoutParamArea1(i32 signext %a)
>> local_unnamed_addr #0 {
>> +entry:
>> + %call = tail call signext i32
>> @onearg(i32 signext %a) #2
>> + ret i32 %call
>> +}
>> +
>> +; PPC64-ELFV2-LABEL: WithoutParamArea2:
>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV2: stdu 1, -32(1)
>> +; PPC64-ELFV2: addi 1, 1, 32
>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV1-LABEL: WithoutParamArea2:
>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV1: stdu 1, -112(1)
>> +; PPC64-ELFV1: addi 1, 1, 112
>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +define signext i32
>> @WithoutParamArea2(i32 signext %a)
>> local_unnamed_addr #0 {
>> +entry:
>> + %call = tail call signext i32
>> @eightargs(i32 signext %a, i32
>> signext %a, i32 signext %a, i32
>> signext %a, i32 signext %a, i32
>> signext %a, i32 signext %a, i32
>> signext %a) #2
>> + ret i32 %call
>> +}
>> +
>> +; If the callee has more than eight
>> integer args or variable number of args,
>> +; parameter area cannot be ommited
>> even for ELFv2 ABI
>> +
>> +; PPC64-ELFV2-LABEL: WithParamArea1:
>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV2: stdu 1, -96(1)
>> +; PPC64-ELFV2: addi 1, 1, 96
>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV1-LABEL: WithParamArea1:
>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV1: stdu 1, -112(1)
>> +; PPC64-ELFV1: addi 1, 1, 112
>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +define signext i32
>> @WithParamArea1(i32 signext %a)
>> local_unnamed_addr #0 {
>> +entry:
>> + %call = tail call signext i32
>> (i32, ...) @varargs(i32 signext %a,
>> i32 signext %a) #2
>> + ret i32 %call
>> +}
>> +
>> +; PPC64-ELFV2-LABEL: WithParamArea2:
>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV2: stdu 1, -112(1)
>> +; PPC64-ELFV2: addi 1, 1, 112
>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV1-LABEL: WithParamArea2:
>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +; PPC64-ELFV1: stdu 1, -128(1)
>> +; PPC64-ELFV1: addi 1, 1, 128
>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}},
>> -{{[0-9]+}}(1)
>> +define signext i32
>> @WithParamArea2(i32 signext %a)
>> local_unnamed_addr #0 {
>> +entry:
>> + %call = tail call signext i32
>> @nineargs(i32 signext %a, i32 signext
>> %a, i32 signext %a, i32 signext %a,
>> i32 signext %a, i32 signext %a, i32
>> signext %a, i32 signext %a, i32
>> signext %a) #2
>> + ret i32 %call
>> +}
>> +
>> +declare signext i32 @onearg(i32
>> signext) local_unnamed_addr #1
>> +declare signext i32 @eightargs(i32
>> signext, i32 signext, i32 signext,
>> i32 signext, i32 signext, i32
>> signext, i32 signext, i32 signext)
>> local_unnamed_addr #1
>> +declare signext i32 @nineargs(i32
>> signext, i32 signext, i32 signext,
>> i32 signext, i32 signext, i32
>> signext, i32 signext, i32 signext,
>> i32 signext) local_unnamed_addr #1
>> +declare signext i32 @varargs(i32
>> signext, ...) local_unnamed_addr #1
>> +
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> <mailto:llvm-commits at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> <mailto:llvm-commits at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
> --
> Hal Finkel
> Lead, Compiler Technology and Programming Languages
> Leadership Computing Facility
> Argonne National Laboratory
>
--
Hal Finkel
Lead, Compiler Technology and Programming Languages
Leadership Computing Facility
Argonne National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170308/0d69e684/attachment.html>
More information about the llvm-commits
mailing list