Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Manman Ren mren at apple.com
Thu Apr 18 11:13:44 PDT 2013


"AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.

As a side note, we may have issues with 9 floats, 3 ints and a small byval.
The standard says we should split the byval only when NSAA is equal to SP (C.5), in the case of 9 floats, 3 ints and a small byval,
we should not split the byval since we advanced NSAA with the 9th float.

Thanks,
Manman

On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:

> Hi Manman,
> 
>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that the testing case.
>> Should it have 9 doubles and 1 integer instead of a single double and a single integer?
>> The testing case passes even without the fix.
> That was my midnight mistake, sorry.
> I reattached patch with proper test. In the end of test file, I've added comments for @doFoo: small description of how parameters must be passed:
> first eight doubles --> D0-D7
> 9th double --> Stack
> 10th param (first i32) --> R0.
> 
> It is important to have 9 double params here, since we must fill-up all possible VFPs, then get compiler to use stack offset, and only after start use GPRs.
> 
>> 
>> I am not sure about the renaming patch though. Functions get|setVarArgsRegSaveSize are used for both vararg and byval.
>> It does not sound right to replace VarARgs with ByVal.
> 
> Actually it is used not for vararg function, as I found it main role of this param in emitPrologue, it is used for proper stack allocation.
> Stack size should enough for storing all registers, used for byval parameters. And yes, if it is VA function, we must also extend this stack area for all unallocated R0-R3 regs storage.
Agree. That is why I think ByVal is not a great name either. 
Since it represents the stack area needed to save parameters that are actually passed in registers, how about RegsSaveSizeForParams or just RegsSaveSize? 
Also it is weird to have "unsigned VARegSaveSize = AFI->getByValRegsSaveSize();" where we assign return value of getByValRegsSaveSize to a variable "VARegSaveSize".

Thanks for working on this.
> 
> -Stepan.
> 
>> Thanks for working on this.
>> Manman
>> 
>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>> 
>>> Hi Manman,
>>> 
>>>>> *4.* One new change I've made is
>>>>> CC_ARM_AAPCS_Common rule.
>>>> Can we fix the above issue in a separate patch with additional testing case?
>>> 
>>> Well.. We can use next case: Co-Processor register candidates may be
>>> either in VFP or in stack, so when all VFP are allocated stack is used. We can use stack without GPR allocation in that case, passing 9 f64 params, for example: first eight goes to d0-d7, night goes to stack. Now as 10th parameter we pass i32, according to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>> 
>>> [quote]
>>> If the size in words of the argument is not more than r4 minus NCRN, the argument is copied into
>>> core registers, starting at the NCRN. The NCRN is incremented by the number of registers used.
>>> Successive registers hold the parts of the argument they would hold if its value were loaded into
>>> those registers from memory using an LDM instruction. The argument has now been allocated.
>>> [/quote]
>>> 
>>> It must to R0 in our case. But currently it goes to stack.
>>> I also checked this case with GCC - it works as in AAPCS written.
>>> 
>>> About refactoring. I propose do it in two stages, first rename "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and detach "StoreByValRegs" functionality from "VarArgStyleRegisters" method.
>>> 
>>> I attached first refactor patch, Calling-Convention patch with test-case, and calling-convention .c test case.
>>> Try run .c file with "arm-linux-gnueabihf-gcc" and "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>> 
>>> P.S.: My beagleboard is very very slow, "release" build takes about 15 hours, test-suite may be 15-20 hours too. I'll try ask somebody to present me faster device, but it take some time.
>>> 
>>> -Stepan.
>>> 
>>> Manman Ren wrote:
>>>> 
>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>> 
>>>>> Hello Manman,
>>>>> 
>>>>> I reattached full patch with fixes.
>>>>> 
>>>>> 1. I've removedskipWastedByValRegs, addByValWastedRegs and Wasted flag.
>>>>> 2. Fixed usage of "offset" variable in LowerFormalArguments
>>>>> 3. And fixed regression test: added function with two byval structures.
>>>> The above looks good.
>>>>> 
>>>>> *4.* One new change I've made is
>>>>> CC_ARM_AAPCS_Common rule.
>>>>> 
>>>>> Currently, for align == 4 and parameter type <=i32, we try to assign it to the first unallocated register from [r0-r3] range, but only if getNextStackOffset() == 0.
>>>>> I suppose that "getNextStackOffset() != 0" is treated as fact? that all registers are used? and params goes to the stack now. After byval structures will be implemented, the next case is possible:
>>>>> 
>>>>> void @f(%i32_struct* byval, i32)
>>>>> 
>>>>> Here, we need to allocate stack for first parameter, but after we still have unallocated registers. And second parameter should be sent to r1. Current CC_ARM_AAPCS_Common implementation will send second parameter to the stack.
>>>>> Though, I'm still not sure about this change, just because may be there are more reasons to check nextStackOffset value.
>>>> Can we fix the above issue in a separate patch with additional testing case?
>>>>> 
>>>>>>>> +    // for byval parameters.
>>>>>>>> +    {
>>>>>>>> +      unsigned InRegsParamsCount = State->getInRegsParamsCount();
>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>> +        unsigned RB, RE;
>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB, RE);
>>>>>>>> +        assert(reg >= RE);
>>>>>>>> +      }
>>>>>>>> +    }
>>>>>>> Is this only for assertion? Is this necessary?
>>>>>> Yes. I've just moved this check from "if" statement, since it must be
>>>>>> true always.
>>>>> How do you think, do we need this assertion?
>>>> I don't really think it is necessary :)
>>>>> 
>>>>> I didn't present refactor patch, just because main patch didn't get LGTM community and hypothetically may contain some mistakes..
>>>> LGTM otherwise.
>>>> 
>>>> Thanks,
>>>> Manman
>>>>> 
>>>>> -Stepan
>>>>> 
>>>>> Stepan Dyatkovskiy wrote:
>>>>>> Manman Ren wrote:
>>>>>>> 
>>>>>>> Hi Stepan,
>>>>>>> 
>>>>>>> One high-level question: do we need to insert the wasted record to
>>>>>>> ByValRegs and later on skip those records?
>>>>>>> It seems to me that we can only insert the records for the actual byval
>>>>>>> parameters.
>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>> skipWastedByValRegs.
>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>> 
>>>>>>> 
>>>>>>>> +
>>>>>>>> +        // "offset" value would be usefull if byval parameter outsides
>>>>>>>> +        // the registers area. Note, in that case nextInRegsParam()
>>>>>>>> will not
>>>>>>>> +        // skip any waste registers, since parameter fillup the whole
>>>>>>>> rest
>>>>>>>> +        // of registers set.
>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>> +        if (!CCInfo.nextInRegsParam() && Flags.getByValSize() >
>>>>>>>> 4*RegsUsed )
>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>       }
>>>>>>>> 
>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>> +      if (offset) {
>>>>>>> 
>>>>>>> This still does not look right.
>>>>>>> There are 3 cases:
>>>>>>> If the byval parameter is in registers partially, we should create a
>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>> If the byval parameter is all in registers, there is no need to create a
>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>> If the byval parameter is all on stack, we should create a
>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>> 
>>>>>>> When looking at the above code, offset is default to zero, if
>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is all on stack),
>>>>>>> offset will still be zero,
>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>> The correct logic seems to be always updating offset inside the if and
>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>   offset = RegEnd - RegBegin; //number of registers occupied by this
>>>>>>> byval parameter
>>>>>>>   CCInfo.nextInRegsParam(); // go to the next byval parameter that is
>>>>>>> in registers partially or all
>>>>>>> }
>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>   // Part of the byval parameter is on stack.
>>>>>>>   ...
>>>>>> Oh.. I missed third case. Then we keep first variant here.
>>>>>> 
>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>> +    // All registers reserved for byval parameters should be allocated.
>>>>>>>> +    // Check that first unallocated register is higher that last one
>>>>>>>> used
>>>>>>>> +    // for byval parameters.
>>>>>>>> +    {
>>>>>>>> +      unsigned InRegsParamsCount = State->getInRegsParamsCount();
>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>> +        unsigned RB, RE;
>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB, RE);
>>>>>>>> +        assert(reg >= RE);
>>>>>>>> +      }
>>>>>>>> +    }
>>>>>>> Is this only for assertion? Is this necessary?
>>>>>> Yes. I've just moved this check from "if" statement, since it must be
>>>>>> true always.
>>>>>> 
>>>>>>>> -void
>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo, SelectionDAG
>>>>>>>> &DAG,
>>>>>>>> -                                        DebugLoc dl, SDValue &Chain,
>>>>>>>> -                                        const Value *OrigArg,
>>>>>>>> -                                        unsigned OffsetFromOrigArg,
>>>>>>>> -                                        unsigned ArgOffset,
>>>>>>>> -                                        bool ForceMutable) const {
>>>>>>>> +// Return: The frame index registers were stored into.
>>>>>>>> +int
>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo, SelectionDAG &DAG,
>>>>>>>> +                                  DebugLoc dl, SDValue &Chain,
>>>>>>>> +                                  const Value *OrigArg,
>>>>>>>> +                                  unsigned InRegsParamRecordIdx,
>>>>>>>> +                                  unsigned OffsetFromOrigArg,
>>>>>>>> +                                  unsigned ArgOffset,
>>>>>>>> +                                  bool ForceMutable) const {
>>>>>>> 
>>>>>>> It would be much cleaner if we can move the refactoring of
>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>> For sure.
>>>>>> 
>>>>>> P.S.: Today I had some troubles with electricity, so I'll try present
>>>>>> new patch tomorrow.
>>>>>> 
>>>>>> -Stepan.
>>>>>> 
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>> 
>>>>> <pr15293-2013-04-12.patch>
>>>> 
>>> 
>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>> 
> 
> <AAPCS-C.4-vs-VFP-2013-04-17.patch>





More information about the llvm-commits mailing list