Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Stepan Dyatkovskiy stpworld at narod.ru
Thu Apr 18 20:41:25 PDT 2013


Fixed :-) New patch (+2 tests) is attached.
-Stepan

Manman Ren wrote:
>
> "AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.
>
> As a side note, we may have issues with 9 floats, 3 ints and a small byval.
> The standard says we should split the byval only when NSAA is equal to SP (C.5), in the case of 9 floats, 3 ints and a small byval,
> we should not split the byval since we advanced NSAA with the 9th float.
>
> Thanks,
> Manman
>
> On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:
>
>> Hi Manman,
>>
>>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that the testing case.
>>> Should it have 9 doubles and 1 integer instead of a single double and a single integer?
>>> The testing case passes even without the fix.
>> That was my midnight mistake, sorry.
>> I reattached patch with proper test. In the end of test file, I've added comments for @doFoo: small description of how parameters must be passed:
>> first eight doubles --> D0-D7
>> 9th double --> Stack
>> 10th param (first i32) --> R0.
>>
>> It is important to have 9 double params here, since we must fill-up all possible VFPs, then get compiler to use stack offset, and only after start use GPRs.
>>
>>>
>>> I am not sure about the renaming patch though. Functions get|setVarArgsRegSaveSize are used for both vararg and byval.
>>> It does not sound right to replace VarARgs with ByVal.
>>
>> Actually it is used not for vararg function, as I found it main role of this param in emitPrologue, it is used for proper stack allocation.
>> Stack size should enough for storing all registers, used for byval parameters. And yes, if it is VA function, we must also extend this stack area for all unallocated R0-R3 regs storage.
> Agree. That is why I think ByVal is not a great name either.
> Since it represents the stack area needed to save parameters that are actually passed in registers, how about RegsSaveSizeForParams or just RegsSaveSize?
> Also it is weird to have "unsigned VARegSaveSize = AFI->getByValRegsSaveSize();" where we assign return value of getByValRegsSaveSize to a variable "VARegSaveSize".
>
> Thanks for working on this.
>>
>> -Stepan.
>>
>>> Thanks for working on this.
>>> Manman
>>>
>>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>>
>>>> Hi Manman,
>>>>
>>>>>> *4.* One new change I've made is
>>>>>> CC_ARM_AAPCS_Common rule.
>>>>> Can we fix the above issue in a separate patch with additional testing case?
>>>>
>>>> Well.. We can use next case: Co-Processor register candidates may be
>>>> either in VFP or in stack, so when all VFP are allocated stack is used. We can use stack without GPR allocation in that case, passing 9 f64 params, for example: first eight goes to d0-d7, night goes to stack. Now as 10th parameter we pass i32, according to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>>>
>>>> [quote]
>>>> If the size in words of the argument is not more than r4 minus NCRN, the argument is copied into
>>>> core registers, starting at the NCRN. The NCRN is incremented by the number of registers used.
>>>> Successive registers hold the parts of the argument they would hold if its value were loaded into
>>>> those registers from memory using an LDM instruction. The argument has now been allocated.
>>>> [/quote]
>>>>
>>>> It must to R0 in our case. But currently it goes to stack.
>>>> I also checked this case with GCC - it works as in AAPCS written.
>>>>
>>>> About refactoring. I propose do it in two stages, first rename "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and detach "StoreByValRegs" functionality from "VarArgStyleRegisters" method.
>>>>
>>>> I attached first refactor patch, Calling-Convention patch with test-case, and calling-convention .c test case.
>>>> Try run .c file with "arm-linux-gnueabihf-gcc" and "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>>
>>>> P.S.: My beagleboard is very very slow, "release" build takes about 15 hours, test-suite may be 15-20 hours too. I'll try ask somebody to present me faster device, but it take some time.
>>>>
>>>> -Stepan.
>>>>
>>>> Manman Ren wrote:
>>>>>
>>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>>
>>>>>> Hello Manman,
>>>>>>
>>>>>> I reattached full patch with fixes.
>>>>>>
>>>>>> 1. I've removedskipWastedByValRegs, addByValWastedRegs and Wasted flag.
>>>>>> 2. Fixed usage of "offset" variable in LowerFormalArguments
>>>>>> 3. And fixed regression test: added function with two byval structures.
>>>>> The above looks good.
>>>>>>
>>>>>> *4.* One new change I've made is
>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>
>>>>>> Currently, for align == 4 and parameter type <=i32, we try to assign it to the first unallocated register from [r0-r3] range, but only if getNextStackOffset() == 0.
>>>>>> I suppose that "getNextStackOffset() != 0" is treated as fact? that all registers are used? and params goes to the stack now. After byval structures will be implemented, the next case is possible:
>>>>>>
>>>>>> void @f(%i32_struct* byval, i32)
>>>>>>
>>>>>> Here, we need to allocate stack for first parameter, but after we still have unallocated registers. And second parameter should be sent to r1. Current CC_ARM_AAPCS_Common implementation will send second parameter to the stack.
>>>>>> Though, I'm still not sure about this change, just because may be there are more reasons to check nextStackOffset value.
>>>>> Can we fix the above issue in a separate patch with additional testing case?
>>>>>>
>>>>>>>>> +    // for byval parameters.
>>>>>>>>> +    {
>>>>>>>>> +      unsigned InRegsParamsCount = State->getInRegsParamsCount();
>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB, RE);
>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>> +      }
>>>>>>>>> +    }
>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>> Yes. I've just moved this check from "if" statement, since it must be
>>>>>>> true always.
>>>>>> How do you think, do we need this assertion?
>>>>> I don't really think it is necessary :)
>>>>>>
>>>>>> I didn't present refactor patch, just because main patch didn't get LGTM community and hypothetically may contain some mistakes..
>>>>> LGTM otherwise.
>>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>>
>>>>>> -Stepan
>>>>>>
>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>> Manman Ren wrote:
>>>>>>>>
>>>>>>>> Hi Stepan,
>>>>>>>>
>>>>>>>> One high-level question: do we need to insert the wasted record to
>>>>>>>> ByValRegs and later on skip those records?
>>>>>>>> It seems to me that we can only insert the records for the actual byval
>>>>>>>> parameters.
>>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>>> skipWastedByValRegs.
>>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>>
>>>>>>>>
>>>>>>>>> +
>>>>>>>>> +        // "offset" value would be usefull if byval parameter outsides
>>>>>>>>> +        // the registers area. Note, in that case nextInRegsParam()
>>>>>>>>> will not
>>>>>>>>> +        // skip any waste registers, since parameter fillup the whole
>>>>>>>>> rest
>>>>>>>>> +        // of registers set.
>>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>>> +        if (!CCInfo.nextInRegsParam() && Flags.getByValSize() >
>>>>>>>>> 4*RegsUsed )
>>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>>        }
>>>>>>>>>
>>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>>> +      if (offset) {
>>>>>>>>
>>>>>>>> This still does not look right.
>>>>>>>> There are 3 cases:
>>>>>>>> If the byval parameter is in registers partially, we should create a
>>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>>> If the byval parameter is all in registers, there is no need to create a
>>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>>> If the byval parameter is all on stack, we should create a
>>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>>
>>>>>>>> When looking at the above code, offset is default to zero, if
>>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is all on stack),
>>>>>>>> offset will still be zero,
>>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>>> The correct logic seems to be always updating offset inside the if and
>>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>>>    offset = RegEnd - RegBegin; //number of registers occupied by this
>>>>>>>> byval parameter
>>>>>>>>    CCInfo.nextInRegsParam(); // go to the next byval parameter that is
>>>>>>>> in registers partially or all
>>>>>>>> }
>>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>>    // Part of the byval parameter is on stack.
>>>>>>>>    ...
>>>>>>> Oh.. I missed third case. Then we keep first variant here.
>>>>>>>
>>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>> +    // All registers reserved for byval parameters should be allocated.
>>>>>>>>> +    // Check that first unallocated register is higher that last one
>>>>>>>>> used
>>>>>>>>> +    // for byval parameters.
>>>>>>>>> +    {
>>>>>>>>> +      unsigned InRegsParamsCount = State->getInRegsParamsCount();
>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB, RE);
>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>> +      }
>>>>>>>>> +    }
>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>> Yes. I've just moved this check from "if" statement, since it must be
>>>>>>> true always.
>>>>>>>
>>>>>>>>> -void
>>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo, SelectionDAG
>>>>>>>>> &DAG,
>>>>>>>>> -                                        DebugLoc dl, SDValue &Chain,
>>>>>>>>> -                                        const Value *OrigArg,
>>>>>>>>> -                                        unsigned OffsetFromOrigArg,
>>>>>>>>> -                                        unsigned ArgOffset,
>>>>>>>>> -                                        bool ForceMutable) const {
>>>>>>>>> +// Return: The frame index registers were stored into.
>>>>>>>>> +int
>>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo, SelectionDAG &DAG,
>>>>>>>>> +                                  DebugLoc dl, SDValue &Chain,
>>>>>>>>> +                                  const Value *OrigArg,
>>>>>>>>> +                                  unsigned InRegsParamRecordIdx,
>>>>>>>>> +                                  unsigned OffsetFromOrigArg,
>>>>>>>>> +                                  unsigned ArgOffset,
>>>>>>>>> +                                  bool ForceMutable) const {
>>>>>>>>
>>>>>>>> It would be much cleaner if we can move the refactoring of
>>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>>> For sure.
>>>>>>>
>>>>>>> P.S.: Today I had some troubles with electricity, so I'll try present
>>>>>>> new patch tomorrow.
>>>>>>>
>>>>>>> -Stepan.
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>
>>>>>> <pr15293-2013-04-12.patch>
>>>>>
>>>>
>>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>
>>
>> <AAPCS-C.4-vs-VFP-2013-04-17.patch>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: AAPCS-C.4-vs-VFP-2013-04-19.patch
Type: text/x-diff
Size: 7002 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130419/e9587d3b/attachment.patch>


More information about the llvm-commits mailing list