Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Stepan Dyatkovskiy stpworld at narod.ru
Mon Apr 22 08:55:25 PDT 2013


Hi all,

AAPCS, 5.5 Parameters Passing, Stage C, C4 and C5 have been fixed, r180011.

So, next patch is in attachment: VarArgsStyleRegisters refactoring.

-Stepan.

Stepan Dyatkovskiy wrote:
> Hello, Manman.
> That's kind of snowball!
> Today test-suite finished with two new failures on my beagle.
> It helped me find out one more hidden logic dependency in ARM backend.
>
> May be its not a mistake, but look:
>
> In ARMCallingConv.h, "f64AssignAAPCS" allocates registers for f64. This
> method is called for VA functions, since there is no CPRCs, and *any*
> parameters could be allocated in R0-R3. Either R0,R1 or R2,R3 could be
> allocated here. If all the registers except R3 were allocated somewhere
> before, f64AssignAAPCS just moves whole f64 to stack, it doesn't waste
> R3, so R3 could be allocated later. But logic seems to be same as in
> ARMTargetLowering::HandleByVal.
>
> My main fix in calling convention was:
>
> def CC_ARM_AAPCS_Common : CallingConv<[
> ...
> -  CCIfType<[i32], CCIf<"State.getNextStackOffset() == 0 &&"
> -                       "ArgFlags.getOrigAlign() != 8",
> +  CCIfType<[i32], CCIf<"ArgFlags.getOrigAlign() != 8",
>                          CCAssignToReg<[R0, R1, R2, R3]>>>,
>
> Now I got some picture why "State.getNextStackOffset() == 0" was added:
> due to f64AssignAAPCS implementation. May be the are more reasons: next
> test-suite results will show it then :-)
>
> Anyway, original CCIfType brakes C.4 and C.5, while my fix brakes some
> implicit statement defined in f64AssignAAPCS:
>
> "stack-offset != 0 doesn't mean all registers are allocated"
>
> So I've fixed f64AssignAAPCS too. Once we waste registers in HandleByVal
> (when we send parameter to stack), I think it would be good to keep same
> behaviour for f64AssignAAPCS since its HandleByVal's analogue for f64
> params.
>
> I attached new patch, now with 3 tests :-)
> (3rd test is 2013-04-21-AAPCS-VA-C.1.cp.ll)
>
> -Stepan.
>
> Manman Ren wrote:
>>
>> LGTM,
>>
>> Thanks a lot,
>> Manman
>>
>> On Apr 18, 2013, at 8:41 PM, Stepan Dyatkovskiy wrote:
>>
>>> Fixed :-) New patch (+2 tests) is attached.
>>> -Stepan
>>>
>>> Manman Ren wrote:
>>>>
>>>> "AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.
>>>>
>>>> As a side note, we may have issues with 9 floats, 3 ints and a small
>>>> byval.
>>>> The standard says we should split the byval only when NSAA is equal
>>>> to SP (C.5), in the case of 9 floats, 3 ints and a small byval,
>>>> we should not split the byval since we advanced NSAA with the 9th
>>>> float.
>>>>
>>>> Thanks,
>>>> Manman
>>>>
>>>> On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:
>>>>
>>>>> Hi Manman,
>>>>>
>>>>>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that the
>>>>>> testing case.
>>>>>> Should it have 9 doubles and 1 integer instead of a single double
>>>>>> and a single integer?
>>>>>> The testing case passes even without the fix.
>>>>> That was my midnight mistake, sorry.
>>>>> I reattached patch with proper test. In the end of test file, I've
>>>>> added comments for @doFoo: small description of how parameters must
>>>>> be passed:
>>>>> first eight doubles --> D0-D7
>>>>> 9th double --> Stack
>>>>> 10th param (first i32) --> R0.
>>>>>
>>>>> It is important to have 9 double params here, since we must fill-up
>>>>> all possible VFPs, then get compiler to use stack offset, and only
>>>>> after start use GPRs.
>>>>>
>>>>>>
>>>>>> I am not sure about the renaming patch though. Functions
>>>>>> get|setVarArgsRegSaveSize are used for both vararg and byval.
>>>>>> It does not sound right to replace VarARgs with ByVal.
>>>>>
>>>>> Actually it is used not for vararg function, as I found it main
>>>>> role of this param in emitPrologue, it is used for proper stack
>>>>> allocation.
>>>>> Stack size should enough for storing all registers, used for byval
>>>>> parameters. And yes, if it is VA function, we must also extend this
>>>>> stack area for all unallocated R0-R3 regs storage.
>>>> Agree. That is why I think ByVal is not a great name either.
>>>> Since it represents the stack area needed to save parameters that
>>>> are actually passed in registers, how about RegsSaveSizeForParams or
>>>> just RegsSaveSize?
>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>> AFI->getByValRegsSaveSize();" where we assign return value of
>>>> getByValRegsSaveSize to a variable "VARegSaveSize".
>>>>
>>>> Thanks for working on this.
>>>>>
>>>>> -Stepan.
>>>>>
>>>>>> Thanks for working on this.
>>>>>> Manman
>>>>>>
>>>>>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>>>>>
>>>>>>> Hi Manman,
>>>>>>>
>>>>>>>>> *4.* One new change I've made is
>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>> Can we fix the above issue in a separate patch with additional
>>>>>>>> testing case?
>>>>>>>
>>>>>>> Well.. We can use next case: Co-Processor register candidates may be
>>>>>>> either in VFP or in stack, so when all VFP are allocated stack is
>>>>>>> used. We can use stack without GPR allocation in that case,
>>>>>>> passing 9 f64 params, for example: first eight goes to d0-d7,
>>>>>>> night goes to stack. Now as 10th parameter we pass i32, according
>>>>>>> to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>>>>>>
>>>>>>> [quote]
>>>>>>> If the size in words of the argument is not more than r4 minus
>>>>>>> NCRN, the argument is copied into
>>>>>>> core registers, starting at the NCRN. The NCRN is incremented by
>>>>>>> the number of registers used.
>>>>>>> Successive registers hold the parts of the argument they would
>>>>>>> hold if its value were loaded into
>>>>>>> those registers from memory using an LDM instruction. The
>>>>>>> argument has now been allocated.
>>>>>>> [/quote]
>>>>>>>
>>>>>>> It must to R0 in our case. But currently it goes to stack.
>>>>>>> I also checked this case with GCC - it works as in AAPCS written.
>>>>>>>
>>>>>>> About refactoring. I propose do it in two stages, first rename
>>>>>>> "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and detach
>>>>>>> "StoreByValRegs" functionality from "VarArgStyleRegisters" method.
>>>>>>>
>>>>>>> I attached first refactor patch, Calling-Convention patch with
>>>>>>> test-case, and calling-convention .c test case.
>>>>>>> Try run .c file with "arm-linux-gnueabihf-gcc" and
>>>>>>> "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>>>>>
>>>>>>> P.S.: My beagleboard is very very slow, "release" build takes
>>>>>>> about 15 hours, test-suite may be 15-20 hours too. I'll try ask
>>>>>>> somebody to present me faster device, but it take some time.
>>>>>>>
>>>>>>> -Stepan.
>>>>>>>
>>>>>>> Manman Ren wrote:
>>>>>>>>
>>>>>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>
>>>>>>>>> Hello Manman,
>>>>>>>>>
>>>>>>>>> I reattached full patch with fixes.
>>>>>>>>>
>>>>>>>>> 1. I've removedskipWastedByValRegs, addByValWastedRegs and
>>>>>>>>> Wasted flag.
>>>>>>>>> 2. Fixed usage of "offset" variable in LowerFormalArguments
>>>>>>>>> 3. And fixed regression test: added function with two byval
>>>>>>>>> structures.
>>>>>>>> The above looks good.
>>>>>>>>>
>>>>>>>>> *4.* One new change I've made is
>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>
>>>>>>>>> Currently, for align == 4 and parameter type <=i32, we try to
>>>>>>>>> assign it to the first unallocated register from [r0-r3] range,
>>>>>>>>> but only if getNextStackOffset() == 0.
>>>>>>>>> I suppose that "getNextStackOffset() != 0" is treated as fact?
>>>>>>>>> that all registers are used? and params goes to the stack now.
>>>>>>>>> After byval structures will be implemented, the next case is
>>>>>>>>> possible:
>>>>>>>>>
>>>>>>>>> void @f(%i32_struct* byval, i32)
>>>>>>>>>
>>>>>>>>> Here, we need to allocate stack for first parameter, but after
>>>>>>>>> we still have unallocated registers. And second parameter
>>>>>>>>> should be sent to r1. Current CC_ARM_AAPCS_Common
>>>>>>>>> implementation will send second parameter to the stack.
>>>>>>>>> Though, I'm still not sure about this change, just because may
>>>>>>>>> be there are more reasons to check nextStackOffset value.
>>>>>>>> Can we fix the above issue in a separate patch with additional
>>>>>>>> testing case?
>>>>>>>>>
>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>> RE);
>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>> +      }
>>>>>>>>>>>> +    }
>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>> Yes. I've just moved this check from "if" statement, since it
>>>>>>>>>> must be
>>>>>>>>>> true always.
>>>>>>>>> How do you think, do we need this assertion?
>>>>>>>> I don't really think it is necessary :)
>>>>>>>>>
>>>>>>>>> I didn't present refactor patch, just because main patch didn't
>>>>>>>>> get LGTM community and hypothetically may contain some mistakes..
>>>>>>>> LGTM otherwise.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Manman
>>>>>>>>>
>>>>>>>>> -Stepan
>>>>>>>>>
>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Stepan,
>>>>>>>>>>>
>>>>>>>>>>> One high-level question: do we need to insert the wasted
>>>>>>>>>>> record to
>>>>>>>>>>> ByValRegs and later on skip those records?
>>>>>>>>>>> It seems to me that we can only insert the records for the
>>>>>>>>>>> actual byval
>>>>>>>>>>> parameters.
>>>>>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>>>>>> skipWastedByValRegs.
>>>>>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> +
>>>>>>>>>>>> +        // "offset" value would be usefull if byval
>>>>>>>>>>>> parameter outsides
>>>>>>>>>>>> +        // the registers area. Note, in that case
>>>>>>>>>>>> nextInRegsParam()
>>>>>>>>>>>> will not
>>>>>>>>>>>> +        // skip any waste registers, since parameter fillup
>>>>>>>>>>>> the whole
>>>>>>>>>>>> rest
>>>>>>>>>>>> +        // of registers set.
>>>>>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>>>>>> +        if (!CCInfo.nextInRegsParam() &&
>>>>>>>>>>>> Flags.getByValSize() >
>>>>>>>>>>>> 4*RegsUsed )
>>>>>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>>>>>        }
>>>>>>>>>>>>
>>>>>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>>>>>> +      if (offset) {
>>>>>>>>>>>
>>>>>>>>>>> This still does not look right.
>>>>>>>>>>> There are 3 cases:
>>>>>>>>>>> If the byval parameter is in registers partially, we should
>>>>>>>>>>> create a
>>>>>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>>>>>> If the byval parameter is all in registers, there is no need
>>>>>>>>>>> to create a
>>>>>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>>>>>> If the byval parameter is all on stack, we should create a
>>>>>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>>>>>
>>>>>>>>>>> When looking at the above code, offset is default to zero, if
>>>>>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is all on
>>>>>>>>>>> stack),
>>>>>>>>>>> offset will still be zero,
>>>>>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>>>>>> The correct logic seems to be always updating offset inside
>>>>>>>>>>> the if and
>>>>>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>>>>>>>>>    offset = RegEnd - RegBegin; //number of registers occupied
>>>>>>>>>>> by this
>>>>>>>>>>> byval parameter
>>>>>>>>>>>    CCInfo.nextInRegsParam(); // go to the next byval
>>>>>>>>>>> parameter that is
>>>>>>>>>>> in registers partially or all
>>>>>>>>>>> }
>>>>>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>>>>>    // Part of the byval parameter is on stack.
>>>>>>>>>>>    ...
>>>>>>>>>> Oh.. I missed third case. Then we keep first variant here.
>>>>>>>>>>
>>>>>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>> +    // All registers reserved for byval parameters should
>>>>>>>>>>>> be allocated.
>>>>>>>>>>>> +    // Check that first unallocated register is higher that
>>>>>>>>>>>> last one
>>>>>>>>>>>> used
>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>> +    {
>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>> RE);
>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>> +      }
>>>>>>>>>>>> +    }
>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>> Yes. I've just moved this check from "if" statement, since it
>>>>>>>>>> must be
>>>>>>>>>> true always.
>>>>>>>>>>
>>>>>>>>>>>> -void
>>>>>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo,
>>>>>>>>>>>> SelectionDAG
>>>>>>>>>>>> &DAG,
>>>>>>>>>>>> -                                        DebugLoc dl,
>>>>>>>>>>>> SDValue &Chain,
>>>>>>>>>>>> -                                        const Value *OrigArg,
>>>>>>>>>>>> -                                        unsigned
>>>>>>>>>>>> OffsetFromOrigArg,
>>>>>>>>>>>> -                                        unsigned ArgOffset,
>>>>>>>>>>>> -                                        bool ForceMutable)
>>>>>>>>>>>> const {
>>>>>>>>>>>> +// Return: The frame index registers were stored into.
>>>>>>>>>>>> +int
>>>>>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo,
>>>>>>>>>>>> SelectionDAG &DAG,
>>>>>>>>>>>> +                                  DebugLoc dl, SDValue &Chain,
>>>>>>>>>>>> +                                  const Value *OrigArg,
>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>> InRegsParamRecordIdx,
>>>>>>>>>>>> +                                  unsigned OffsetFromOrigArg,
>>>>>>>>>>>> +                                  unsigned ArgOffset,
>>>>>>>>>>>> +                                  bool ForceMutable) const {
>>>>>>>>>>>
>>>>>>>>>>> It would be much cleaner if we can move the refactoring of
>>>>>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>>>>>> For sure.
>>>>>>>>>>
>>>>>>>>>> P.S.: Today I had some troubles with electricity, so I'll try
>>>>>>>>>> present
>>>>>>>>>> new patch tomorrow.
>>>>>>>>>>
>>>>>>>>>> -Stepan.
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>
>>>>>>>>> <pr15293-2013-04-12.patch>
>>>>>>>>
>>>>>>>
>>>>>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>>>>>
>>>>>>
>>>>>
>>>>> <AAPCS-C.4-vs-VFP-2013-04-17.patch>
>>>>
>>>
>>> <AAPCS-C.4-vs-VFP-2013-04-19.patch>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: StoreByValRegs-2013-04-17.patch
Type: text/x-diff
Size: 12232 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130422/1e630d9d/attachment.patch>


More information about the llvm-commits mailing list