Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Manman Ren mren at apple.com
Fri Apr 19 10:45:43 PDT 2013


LGTM,

Thanks a lot,
Manman

On Apr 18, 2013, at 8:41 PM, Stepan Dyatkovskiy wrote:

> Fixed :-) New patch (+2 tests) is attached.
> -Stepan
> 
> Manman Ren wrote:
>> 
>> "AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.
>> 
>> As a side note, we may have issues with 9 floats, 3 ints and a small byval.
>> The standard says we should split the byval only when NSAA is equal to SP (C.5), in the case of 9 floats, 3 ints and a small byval,
>> we should not split the byval since we advanced NSAA with the 9th float.
>> 
>> Thanks,
>> Manman
>> 
>> On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:
>> 
>>> Hi Manman,
>>> 
>>>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that the testing case.
>>>> Should it have 9 doubles and 1 integer instead of a single double and a single integer?
>>>> The testing case passes even without the fix.
>>> That was my midnight mistake, sorry.
>>> I reattached patch with proper test. In the end of test file, I've added comments for @doFoo: small description of how parameters must be passed:
>>> first eight doubles --> D0-D7
>>> 9th double --> Stack
>>> 10th param (first i32) --> R0.
>>> 
>>> It is important to have 9 double params here, since we must fill-up all possible VFPs, then get compiler to use stack offset, and only after start use GPRs.
>>> 
>>>> 
>>>> I am not sure about the renaming patch though. Functions get|setVarArgsRegSaveSize are used for both vararg and byval.
>>>> It does not sound right to replace VarARgs with ByVal.
>>> 
>>> Actually it is used not for vararg function, as I found it main role of this param in emitPrologue, it is used for proper stack allocation.
>>> Stack size should enough for storing all registers, used for byval parameters. And yes, if it is VA function, we must also extend this stack area for all unallocated R0-R3 regs storage.
>> Agree. That is why I think ByVal is not a great name either.
>> Since it represents the stack area needed to save parameters that are actually passed in registers, how about RegsSaveSizeForParams or just RegsSaveSize?
>> Also it is weird to have "unsigned VARegSaveSize = AFI->getByValRegsSaveSize();" where we assign return value of getByValRegsSaveSize to a variable "VARegSaveSize".
>> 
>> Thanks for working on this.
>>> 
>>> -Stepan.
>>> 
>>>> Thanks for working on this.
>>>> Manman
>>>> 
>>>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>>> 
>>>>> Hi Manman,
>>>>> 
>>>>>>> *4.* One new change I've made is
>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>> Can we fix the above issue in a separate patch with additional testing case?
>>>>> 
>>>>> Well.. We can use next case: Co-Processor register candidates may be
>>>>> either in VFP or in stack, so when all VFP are allocated stack is used. We can use stack without GPR allocation in that case, passing 9 f64 params, for example: first eight goes to d0-d7, night goes to stack. Now as 10th parameter we pass i32, according to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>>>> 
>>>>> [quote]
>>>>> If the size in words of the argument is not more than r4 minus NCRN, the argument is copied into
>>>>> core registers, starting at the NCRN. The NCRN is incremented by the number of registers used.
>>>>> Successive registers hold the parts of the argument they would hold if its value were loaded into
>>>>> those registers from memory using an LDM instruction. The argument has now been allocated.
>>>>> [/quote]
>>>>> 
>>>>> It must to R0 in our case. But currently it goes to stack.
>>>>> I also checked this case with GCC - it works as in AAPCS written.
>>>>> 
>>>>> About refactoring. I propose do it in two stages, first rename "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and detach "StoreByValRegs" functionality from "VarArgStyleRegisters" method.
>>>>> 
>>>>> I attached first refactor patch, Calling-Convention patch with test-case, and calling-convention .c test case.
>>>>> Try run .c file with "arm-linux-gnueabihf-gcc" and "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>>> 
>>>>> P.S.: My beagleboard is very very slow, "release" build takes about 15 hours, test-suite may be 15-20 hours too. I'll try ask somebody to present me faster device, but it take some time.
>>>>> 
>>>>> -Stepan.
>>>>> 
>>>>> Manman Ren wrote:
>>>>>> 
>>>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>>> 
>>>>>>> Hello Manman,
>>>>>>> 
>>>>>>> I reattached full patch with fixes.
>>>>>>> 
>>>>>>> 1. I've removedskipWastedByValRegs, addByValWastedRegs and Wasted flag.
>>>>>>> 2. Fixed usage of "offset" variable in LowerFormalArguments
>>>>>>> 3. And fixed regression test: added function with two byval structures.
>>>>>> The above looks good.
>>>>>>> 
>>>>>>> *4.* One new change I've made is
>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>> 
>>>>>>> Currently, for align == 4 and parameter type <=i32, we try to assign it to the first unallocated register from [r0-r3] range, but only if getNextStackOffset() == 0.
>>>>>>> I suppose that "getNextStackOffset() != 0" is treated as fact? that all registers are used? and params goes to the stack now. After byval structures will be implemented, the next case is possible:
>>>>>>> 
>>>>>>> void @f(%i32_struct* byval, i32)
>>>>>>> 
>>>>>>> Here, we need to allocate stack for first parameter, but after we still have unallocated registers. And second parameter should be sent to r1. Current CC_ARM_AAPCS_Common implementation will send second parameter to the stack.
>>>>>>> Though, I'm still not sure about this change, just because may be there are more reasons to check nextStackOffset value.
>>>>>> Can we fix the above issue in a separate patch with additional testing case?
>>>>>>> 
>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>> +    {
>>>>>>>>>> +      unsigned InRegsParamsCount = State->getInRegsParamsCount();
>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB, RE);
>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>> +      }
>>>>>>>>>> +    }
>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>> Yes. I've just moved this check from "if" statement, since it must be
>>>>>>>> true always.
>>>>>>> How do you think, do we need this assertion?
>>>>>> I don't really think it is necessary :)
>>>>>>> 
>>>>>>> I didn't present refactor patch, just because main patch didn't get LGTM community and hypothetically may contain some mistakes..
>>>>>> LGTM otherwise.
>>>>>> 
>>>>>> Thanks,
>>>>>> Manman
>>>>>>> 
>>>>>>> -Stepan
>>>>>>> 
>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>> Manman Ren wrote:
>>>>>>>>> 
>>>>>>>>> Hi Stepan,
>>>>>>>>> 
>>>>>>>>> One high-level question: do we need to insert the wasted record to
>>>>>>>>> ByValRegs and later on skip those records?
>>>>>>>>> It seems to me that we can only insert the records for the actual byval
>>>>>>>>> parameters.
>>>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>>>> skipWastedByValRegs.
>>>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> +
>>>>>>>>>> +        // "offset" value would be usefull if byval parameter outsides
>>>>>>>>>> +        // the registers area. Note, in that case nextInRegsParam()
>>>>>>>>>> will not
>>>>>>>>>> +        // skip any waste registers, since parameter fillup the whole
>>>>>>>>>> rest
>>>>>>>>>> +        // of registers set.
>>>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>>>> +        if (!CCInfo.nextInRegsParam() && Flags.getByValSize() >
>>>>>>>>>> 4*RegsUsed )
>>>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>>>       }
>>>>>>>>>> 
>>>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>>>> +      if (offset) {
>>>>>>>>> 
>>>>>>>>> This still does not look right.
>>>>>>>>> There are 3 cases:
>>>>>>>>> If the byval parameter is in registers partially, we should create a
>>>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>>>> If the byval parameter is all in registers, there is no need to create a
>>>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>>>> If the byval parameter is all on stack, we should create a
>>>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>>> 
>>>>>>>>> When looking at the above code, offset is default to zero, if
>>>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is all on stack),
>>>>>>>>> offset will still be zero,
>>>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>>>> The correct logic seems to be always updating offset inside the if and
>>>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>>>>>   offset = RegEnd - RegBegin; //number of registers occupied by this
>>>>>>>>> byval parameter
>>>>>>>>>   CCInfo.nextInRegsParam(); // go to the next byval parameter that is
>>>>>>>>> in registers partially or all
>>>>>>>>> }
>>>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>>>   // Part of the byval parameter is on stack.
>>>>>>>>>   ...
>>>>>>>> Oh.. I missed third case. Then we keep first variant here.
>>>>>>>> 
>>>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>> +    // All registers reserved for byval parameters should be allocated.
>>>>>>>>>> +    // Check that first unallocated register is higher that last one
>>>>>>>>>> used
>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>> +    {
>>>>>>>>>> +      unsigned InRegsParamsCount = State->getInRegsParamsCount();
>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB, RE);
>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>> +      }
>>>>>>>>>> +    }
>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>> Yes. I've just moved this check from "if" statement, since it must be
>>>>>>>> true always.
>>>>>>>> 
>>>>>>>>>> -void
>>>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo, SelectionDAG
>>>>>>>>>> &DAG,
>>>>>>>>>> -                                        DebugLoc dl, SDValue &Chain,
>>>>>>>>>> -                                        const Value *OrigArg,
>>>>>>>>>> -                                        unsigned OffsetFromOrigArg,
>>>>>>>>>> -                                        unsigned ArgOffset,
>>>>>>>>>> -                                        bool ForceMutable) const {
>>>>>>>>>> +// Return: The frame index registers were stored into.
>>>>>>>>>> +int
>>>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo, SelectionDAG &DAG,
>>>>>>>>>> +                                  DebugLoc dl, SDValue &Chain,
>>>>>>>>>> +                                  const Value *OrigArg,
>>>>>>>>>> +                                  unsigned InRegsParamRecordIdx,
>>>>>>>>>> +                                  unsigned OffsetFromOrigArg,
>>>>>>>>>> +                                  unsigned ArgOffset,
>>>>>>>>>> +                                  bool ForceMutable) const {
>>>>>>>>> 
>>>>>>>>> It would be much cleaner if we can move the refactoring of
>>>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>>>> For sure.
>>>>>>>> 
>>>>>>>> P.S.: Today I had some troubles with electricity, so I'll try present
>>>>>>>> new patch tomorrow.
>>>>>>>> 
>>>>>>>> -Stepan.
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>> 
>>>>>>> <pr15293-2013-04-12.patch>
>>>>>> 
>>>>> 
>>>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>> 
>>> 
>>> <AAPCS-C.4-vs-VFP-2013-04-17.patch>
>> 
> 
> <AAPCS-C.4-vs-VFP-2013-04-19.patch>





More information about the llvm-commits mailing list