[llvm] r296771 - [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack frame size
Nemanja Ivanovic via llvm-commits
llvm-commits at lists.llvm.org
Tue Mar 7 01:59:31 PST 2017
Thanks Daniel and Tim,
It's certainly a curious thing that this occurs. I am just curious, does
this happen at noopt or with optimization? The only thing I can think of is
that maybe this stack frame reduction isn't valid if mem2reg isn't run. But
this is more or less a total shot in the dark.
Nemanja.
On Tue, Mar 7, 2017 at 9:20 AM, Daniel Jasper <djasper at google.com> wrote:
> Just FYI, Tim has reverted this patch and is looking at reducing a test
> case. He should have one tomorrow morning.
>
> So far, he found that the first parameter is live but clobbered
> immediately in some case.
> Assembly:
> ld r3,-31576(r2)
>
> (I don't know what that means, but just wanted to provide as much
> information as I can, in case that makes it easy for you to spot the bug)
>
> Cheers,
> Danie
>
> On Tue, Mar 7, 2017 at 7:48 AM, Daniel Jasper <djasper at google.com> wrote:
>
>> This seems to be causing some miscompiles. Working on a test case.
>>
>> On Thu, Mar 2, 2017 at 6:39 PM, Nemanja Ivanovic via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>>
>>> Author: nemanjai
>>> Date: Thu Mar 2 11:38:59 2017
>>> New Revision: 296771
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=296771&view=rev
>>> Log:
>>> [PowerPC][ELFv2ABI] Allocate parameter area on-demand to reduce stack
>>> frame size
>>>
>>> This patch reduces the stack frame size by not allocating the parameter
>>> area if
>>> it is not required. In the current implementation
>>> LowerFormalArguments_64SVR4
>>> already handles the parameter area, but LowerCall_64SVR4 does not
>>> (when calculating the stack frame size). What this patch does is make
>>> LowerCall_64SVR4 consistent with LowerFormalArguments_64SVR4.
>>>
>>> Committing on behalf of Hiroshi Inoue.
>>>
>>> Differential Revision: https://reviews.llvm.org/D29881
>>>
>>> Added:
>>> llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>>> Modified:
>>> llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>>>
>>> Modified: llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/Po
>>> werPC/PPCISelLowering.cpp?rev=296771&r1=296770&r2=296771&view=diff
>>> ============================================================
>>> ==================
>>> --- llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp (original)
>>> +++ llvm/trunk/lib/Target/PowerPC/PPCISelLowering.cpp Thu Mar 2
>>> 11:38:59 2017
>>> @@ -5147,10 +5147,30 @@ SDValue PPCTargetLowering::LowerCall_64S
>>> };
>>>
>>> const unsigned NumGPRs = array_lengthof(GPR);
>>> - const unsigned NumFPRs = 13;
>>> + const unsigned NumFPRs = useSoftFloat() ? 0 : 13;
>>> const unsigned NumVRs = array_lengthof(VR);
>>> const unsigned NumQFPRs = NumFPRs;
>>>
>>> + // On ELFv2, we can avoid allocating the parameter area if all the
>>> arguments
>>> + // can be passed to the callee in registers.
>>> + // For the fast calling convention, there is another check below.
>>> + // Note: keep consistent with LowerFormalArguments_64SVR4()
>>> + bool HasParameterArea = !isELFv2ABI || isVarArg || CallConv ==
>>> CallingConv::Fast;
>>> + if (!HasParameterArea) {
>>> + unsigned ParamAreaSize = NumGPRs * PtrByteSize;
>>> + unsigned AvailableFPRs = NumFPRs;
>>> + unsigned AvailableVRs = NumVRs;
>>> + unsigned NumBytesTmp = NumBytes;
>>> + for (unsigned i = 0; i != NumOps; ++i) {
>>> + if (Outs[i].Flags.isNest()) continue;
>>> + if (CalculateStackSlotUsed(Outs[i].VT, Outs[i].ArgVT,
>>> Outs[i].Flags,
>>> + PtrByteSize, LinkageSize, ParamAreaSize,
>>> + NumBytesTmp, AvailableFPRs,
>>> AvailableVRs,
>>> + Subtarget.hasQPX()))
>>> + HasParameterArea = true;
>>> + }
>>> + }
>>> +
>>> // When using the fast calling convention, we don't provide backing
>>> for
>>> // arguments that will be in registers.
>>> unsigned NumGPRsUsed = 0, NumFPRsUsed = 0, NumVRsUsed = 0;
>>> @@ -5218,13 +5238,18 @@ SDValue PPCTargetLowering::LowerCall_64S
>>>
>>> unsigned NumBytesActuallyUsed = NumBytes;
>>>
>>> - // The prolog code of the callee may store up to 8 GPR argument
>>> registers to
>>> + // In the old ELFv1 ABI,
>>> + // the prolog code of the callee may store up to 8 GPR argument
>>> registers to
>>> // the stack, allowing va_start to index over them in memory if its
>>> varargs.
>>> // Because we cannot tell if this is needed on the caller side, we
>>> have to
>>> // conservatively assume that it is needed. As such, make sure we
>>> have at
>>> // least enough stack space for the caller to store the 8 GPRs.
>>> - // FIXME: On ELFv2, it may be unnecessary to allocate the parameter
>>> area.
>>> - NumBytes = std::max(NumBytes, LinkageSize + 8 * PtrByteSize);
>>> + // In the ELFv2 ABI, we allocate the parameter area iff a callee
>>> + // really requires memory operands, e.g. a vararg function.
>>> + if (HasParameterArea)
>>> + NumBytes = std::max(NumBytes, LinkageSize + 8 * PtrByteSize);
>>> + else
>>> + NumBytes = LinkageSize;
>>>
>>> // Tail call needs the stack to be aligned.
>>> if (getTargetMachine().Options.GuaranteedTailCallOpt &&
>>> @@ -5443,6 +5468,8 @@ SDValue PPCTargetLowering::LowerCall_64S
>>> if (CallConv == CallingConv::Fast)
>>> ComputePtrOff();
>>>
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist to pass an argument in
>>> memory.");
>>> LowerMemOpCallTo(DAG, MF, Chain, Arg, PtrOff, SPDiff, ArgOffset,
>>> true, isTailCall, false, MemOpChains,
>>> TailCallArguments, dl);
>>> @@ -5528,6 +5555,8 @@ SDValue PPCTargetLowering::LowerCall_64S
>>> PtrOff = DAG.getNode(ISD::ADD, dl, PtrVT, PtrOff, ConstFour);
>>> }
>>>
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist to pass an argument in
>>> memory.");
>>> LowerMemOpCallTo(DAG, MF, Chain, Arg, PtrOff, SPDiff, ArgOffset,
>>> true, isTailCall, false, MemOpChains,
>>> TailCallArguments, dl);
>>> @@ -5562,6 +5591,8 @@ SDValue PPCTargetLowering::LowerCall_64S
>>> // GPRs when within range. For now, we always put the value in
>>> both
>>> // locations (or even all three).
>>> if (isVarArg) {
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist if we have a varargs call.");
>>> // We could elide this store in the case where the object fits
>>> // entirely in R registers. Maybe later.
>>> SDValue Store =
>>> @@ -5594,6 +5625,8 @@ SDValue PPCTargetLowering::LowerCall_64S
>>> if (CallConv == CallingConv::Fast)
>>> ComputePtrOff();
>>>
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist to pass an argument in
>>> memory.");
>>> LowerMemOpCallTo(DAG, MF, Chain, Arg, PtrOff, SPDiff, ArgOffset,
>>> true, isTailCall, true, MemOpChains,
>>> TailCallArguments, dl);
>>> @@ -5614,6 +5647,8 @@ SDValue PPCTargetLowering::LowerCall_64S
>>> case MVT::v4i1: {
>>> bool IsF32 = Arg.getValueType().getSimpleVT().SimpleTy ==
>>> MVT::v4f32;
>>> if (isVarArg) {
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist if we have a varargs call.");
>>> // We could elide this store in the case where the object fits
>>> // entirely in R registers. Maybe later.
>>> SDValue Store =
>>> @@ -5646,6 +5681,8 @@ SDValue PPCTargetLowering::LowerCall_64S
>>> if (CallConv == CallingConv::Fast)
>>> ComputePtrOff();
>>>
>>> + assert(HasParameterArea &&
>>> + "Parameter area must exist to pass an argument in
>>> memory.");
>>> LowerMemOpCallTo(DAG, MF, Chain, Arg, PtrOff, SPDiff, ArgOffset,
>>> true, isTailCall, true, MemOpChains,
>>> TailCallArguments, dl);
>>> @@ -5660,7 +5697,8 @@ SDValue PPCTargetLowering::LowerCall_64S
>>> }
>>> }
>>>
>>> - assert(NumBytesActuallyUsed == ArgOffset);
>>> + assert((!HasParameterArea || NumBytesActuallyUsed == ArgOffset) &&
>>> + "mismatch in size of parameter area");
>>> (void)NumBytesActuallyUsed;
>>>
>>> if (!MemOpChains.empty())
>>>
>>> Added: llvm/trunk/test/CodeGen/PowerPC/stacksize.ll
>>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/
>>> PowerPC/stacksize.ll?rev=296771&view=auto
>>> ============================================================
>>> ==================
>>> --- llvm/trunk/test/CodeGen/PowerPC/stacksize.ll (added)
>>> +++ llvm/trunk/test/CodeGen/PowerPC/stacksize.ll Thu Mar 2 11:38:59
>>> 2017
>>> @@ -0,0 +1,86 @@
>>> +; For ELFv2 ABI, we can avoid allocating the parameter area in the
>>> stack frame of the caller function
>>> +; if all the arguments can be passed to the callee in registers.
>>> +; For ELFv1 ABI, we always need to allocate the parameter area.
>>> +
>>> +; Tests for ELFv2 ABI
>>> +; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu
>>> -target-abi elfv2 < %s | FileCheck %s -check-prefix=PPC64-ELFV2
>>> +; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu
>>> -target-abi elfv2 < %s | FileCheck %s -check-prefix=PPC64-ELFV2
>>> +
>>> +; Tests for ELFv1 ABI
>>> +; RUN: llc -verify-machineinstrs -mtriple=powerpc64le-unknown-linux-gnu
>>> -target-abi elfv1 < %s | FileCheck %s -check-prefix=PPC64-ELFV1
>>> +; RUN: llc -verify-machineinstrs -mtriple=powerpc64-unknown-linux-gnu
>>> -target-abi elfv1 < %s | FileCheck %s -check-prefix=PPC64-ELFV1
>>> +
>>> +; If the callee has at most eight integer args, parameter area can be
>>> ommited for ELFv2 ABI.
>>> +
>>> +; PPC64-ELFV2-LABEL: WithoutParamArea1:
>>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV2: stdu 1, -32(1)
>>> +; PPC64-ELFV2: addi 1, 1, 32
>>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1-LABEL: WithoutParamArea1:
>>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1: stdu 1, -112(1)
>>> +; PPC64-ELFV1: addi 1, 1, 112
>>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +define signext i32 @WithoutParamArea1(i32 signext %a)
>>> local_unnamed_addr #0 {
>>> +entry:
>>> + %call = tail call signext i32 @onearg(i32 signext %a) #2
>>> + ret i32 %call
>>> +}
>>> +
>>> +; PPC64-ELFV2-LABEL: WithoutParamArea2:
>>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV2: stdu 1, -32(1)
>>> +; PPC64-ELFV2: addi 1, 1, 32
>>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1-LABEL: WithoutParamArea2:
>>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1: stdu 1, -112(1)
>>> +; PPC64-ELFV1: addi 1, 1, 112
>>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +define signext i32 @WithoutParamArea2(i32 signext %a)
>>> local_unnamed_addr #0 {
>>> +entry:
>>> + %call = tail call signext i32 @eightargs(i32 signext %a, i32 signext
>>> %a, i32 signext %a, i32 signext %a, i32 signext %a, i32 signext %a, i32
>>> signext %a, i32 signext %a) #2
>>> + ret i32 %call
>>> +}
>>> +
>>> +; If the callee has more than eight integer args or variable number of
>>> args,
>>> +; parameter area cannot be ommited even for ELFv2 ABI
>>> +
>>> +; PPC64-ELFV2-LABEL: WithParamArea1:
>>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV2: stdu 1, -96(1)
>>> +; PPC64-ELFV2: addi 1, 1, 96
>>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1-LABEL: WithParamArea1:
>>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1: stdu 1, -112(1)
>>> +; PPC64-ELFV1: addi 1, 1, 112
>>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +define signext i32 @WithParamArea1(i32 signext %a) local_unnamed_addr
>>> #0 {
>>> +entry:
>>> + %call = tail call signext i32 (i32, ...) @varargs(i32 signext %a, i32
>>> signext %a) #2
>>> + ret i32 %call
>>> +}
>>> +
>>> +; PPC64-ELFV2-LABEL: WithParamArea2:
>>> +; PPC64-ELFV2-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV2: stdu 1, -112(1)
>>> +; PPC64-ELFV2: addi 1, 1, 112
>>> +; PPC64-ELFV2-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1-LABEL: WithParamArea2:
>>> +; PPC64-ELFV1-NOT: stw {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +; PPC64-ELFV1: stdu 1, -128(1)
>>> +; PPC64-ELFV1: addi 1, 1, 128
>>> +; PPC64-ELFV1-NOT: lwz {{[0-9]+}}, -{{[0-9]+}}(1)
>>> +define signext i32 @WithParamArea2(i32 signext %a) local_unnamed_addr
>>> #0 {
>>> +entry:
>>> + %call = tail call signext i32 @nineargs(i32 signext %a, i32 signext
>>> %a, i32 signext %a, i32 signext %a, i32 signext %a, i32 signext %a, i32
>>> signext %a, i32 signext %a, i32 signext %a) #2
>>> + ret i32 %call
>>> +}
>>> +
>>> +declare signext i32 @onearg(i32 signext) local_unnamed_addr #1
>>> +declare signext i32 @eightargs(i32 signext, i32 signext, i32 signext,
>>> i32 signext, i32 signext, i32 signext, i32 signext, i32 signext)
>>> local_unnamed_addr #1
>>> +declare signext i32 @nineargs(i32 signext, i32 signext, i32 signext,
>>> i32 signext, i32 signext, i32 signext, i32 signext, i32 signext, i32
>>> signext) local_unnamed_addr #1
>>> +declare signext i32 @varargs(i32 signext, ...) local_unnamed_addr #1
>>> +
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170307/57bfb13b/attachment-0001.html>
More information about the llvm-commits
mailing list