Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Manman Ren mren at apple.com
Fri Apr 26 13:15:49 PDT 2013


Hi Stepan,

Thanks for working on this.

I put in a comment about the refactor patch sometime ago:
>> I am not sure about the renaming patch though. Functions get|setVarArgsRegSaveSize are used for both vararg and byval.
>> It does not sound right to replace VarARgs with ByVal.
> 
> Actually it is used not for vararg function, as I found it main role of this param in emitPrologue, it is used for proper stack allocation.
> Stack size should enough for storing all registers, used for byval parameters. And yes, if it is VA function, we must also extend this stack area for all unallocated R0-R3 regs storage.

Agree. That is why I think ByVal is not a great name either. 
Since it represents the stack area needed to save parameters that are actually passed in registers, how about RegsSaveSizeForParams or just RegsSaveSize? 
Also it is weird to have "unsigned VARegSaveSize = AFI->getByValRegsSaveSize();" where we assign return value of getByValRegsSaveSize to a variable "VARegSaveSize".

Manman

On Apr 26, 2013, at 11:12 AM, Stepan Dyatkovskiy wrote:

> Hi Manman and Renato,
> Patch was tested with test-suite all test have bee passed.
> -Stepan
> 
> Stepan Dyatkovskiy wrote:
>> Hi all,
>> 
>> AAPCS, 5.5 Parameters Passing, Stage C, C4 and C5 have been fixed, r180011.
>> 
>> So, next patch is in attachment: VarArgsStyleRegisters refactoring.
>> 
>> -Stepan.
>> 
>> Stepan Dyatkovskiy wrote:
>>> Hello, Manman.
>>> That's kind of snowball!
>>> Today test-suite finished with two new failures on my beagle.
>>> It helped me find out one more hidden logic dependency in ARM backend.
>>> 
>>> May be its not a mistake, but look:
>>> 
>>> In ARMCallingConv.h, "f64AssignAAPCS" allocates registers for f64. This
>>> method is called for VA functions, since there is no CPRCs, and *any*
>>> parameters could be allocated in R0-R3. Either R0,R1 or R2,R3 could be
>>> allocated here. If all the registers except R3 were allocated somewhere
>>> before, f64AssignAAPCS just moves whole f64 to stack, it doesn't waste
>>> R3, so R3 could be allocated later. But logic seems to be same as in
>>> ARMTargetLowering::HandleByVal.
>>> 
>>> My main fix in calling convention was:
>>> 
>>> def CC_ARM_AAPCS_Common : CallingConv<[
>>> ...
>>> -  CCIfType<[i32], CCIf<"State.getNextStackOffset() == 0 &&"
>>> -                       "ArgFlags.getOrigAlign() != 8",
>>> +  CCIfType<[i32], CCIf<"ArgFlags.getOrigAlign() != 8",
>>>                         CCAssignToReg<[R0, R1, R2, R3]>>>,
>>> 
>>> Now I got some picture why "State.getNextStackOffset() == 0" was added:
>>> due to f64AssignAAPCS implementation. May be the are more reasons: next
>>> test-suite results will show it then :-)
>>> 
>>> Anyway, original CCIfType brakes C.4 and C.5, while my fix brakes some
>>> implicit statement defined in f64AssignAAPCS:
>>> 
>>> "stack-offset != 0 doesn't mean all registers are allocated"
>>> 
>>> So I've fixed f64AssignAAPCS too. Once we waste registers in HandleByVal
>>> (when we send parameter to stack), I think it would be good to keep same
>>> behaviour for f64AssignAAPCS since its HandleByVal's analogue for f64
>>> params.
>>> 
>>> I attached new patch, now with 3 tests :-)
>>> (3rd test is 2013-04-21-AAPCS-VA-C.1.cp.ll)
>>> 
>>> -Stepan.
>>> 
>>> Manman Ren wrote:
>>>> 
>>>> LGTM,
>>>> 
>>>> Thanks a lot,
>>>> Manman
>>>> 
>>>> On Apr 18, 2013, at 8:41 PM, Stepan Dyatkovskiy wrote:
>>>> 
>>>>> Fixed :-) New patch (+2 tests) is attached.
>>>>> -Stepan
>>>>> 
>>>>> Manman Ren wrote:
>>>>>> 
>>>>>> "AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.
>>>>>> 
>>>>>> As a side note, we may have issues with 9 floats, 3 ints and a small
>>>>>> byval.
>>>>>> The standard says we should split the byval only when NSAA is equal
>>>>>> to SP (C.5), in the case of 9 floats, 3 ints and a small byval,
>>>>>> we should not split the byval since we advanced NSAA with the 9th
>>>>>> float.
>>>>>> 
>>>>>> Thanks,
>>>>>> Manman
>>>>>> 
>>>>>> On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:
>>>>>> 
>>>>>>> Hi Manman,
>>>>>>> 
>>>>>>>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that the
>>>>>>>> testing case.
>>>>>>>> Should it have 9 doubles and 1 integer instead of a single double
>>>>>>>> and a single integer?
>>>>>>>> The testing case passes even without the fix.
>>>>>>> That was my midnight mistake, sorry.
>>>>>>> I reattached patch with proper test. In the end of test file, I've
>>>>>>> added comments for @doFoo: small description of how parameters must
>>>>>>> be passed:
>>>>>>> first eight doubles --> D0-D7
>>>>>>> 9th double --> Stack
>>>>>>> 10th param (first i32) --> R0.
>>>>>>> 
>>>>>>> It is important to have 9 double params here, since we must fill-up
>>>>>>> all possible VFPs, then get compiler to use stack offset, and only
>>>>>>> after start use GPRs.
>>>>>>> 
>>>>>>>> 
>>>>>>>> I am not sure about the renaming patch though. Functions
>>>>>>>> get|setVarArgsRegSaveSize are used for both vararg and byval.
>>>>>>>> It does not sound right to replace VarARgs with ByVal.
>>>>>>> 
>>>>>>> Actually it is used not for vararg function, as I found it main
>>>>>>> role of this param in emitPrologue, it is used for proper stack
>>>>>>> allocation.
>>>>>>> Stack size should enough for storing all registers, used for byval
>>>>>>> parameters. And yes, if it is VA function, we must also extend this
>>>>>>> stack area for all unallocated R0-R3 regs storage.
>>>>>> Agree. That is why I think ByVal is not a great name either.
>>>>>> Since it represents the stack area needed to save parameters that
>>>>>> are actually passed in registers, how about RegsSaveSizeForParams or
>>>>>> just RegsSaveSize?
>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>> AFI->getByValRegsSaveSize();" where we assign return value of
>>>>>> getByValRegsSaveSize to a variable "VARegSaveSize".
>>>>>> 
>>>>>> Thanks for working on this.
>>>>>>> 
>>>>>>> -Stepan.
>>>>>>> 
>>>>>>>> Thanks for working on this.
>>>>>>>> Manman
>>>>>>>> 
>>>>>>>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>>>>>>> 
>>>>>>>>> Hi Manman,
>>>>>>>>> 
>>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>> Can we fix the above issue in a separate patch with additional
>>>>>>>>>> testing case?
>>>>>>>>> 
>>>>>>>>> Well.. We can use next case: Co-Processor register candidates
>>>>>>>>> may be
>>>>>>>>> either in VFP or in stack, so when all VFP are allocated stack is
>>>>>>>>> used. We can use stack without GPR allocation in that case,
>>>>>>>>> passing 9 f64 params, for example: first eight goes to d0-d7,
>>>>>>>>> night goes to stack. Now as 10th parameter we pass i32, according
>>>>>>>>> to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>>>>>>>> 
>>>>>>>>> [quote]
>>>>>>>>> If the size in words of the argument is not more than r4 minus
>>>>>>>>> NCRN, the argument is copied into
>>>>>>>>> core registers, starting at the NCRN. The NCRN is incremented by
>>>>>>>>> the number of registers used.
>>>>>>>>> Successive registers hold the parts of the argument they would
>>>>>>>>> hold if its value were loaded into
>>>>>>>>> those registers from memory using an LDM instruction. The
>>>>>>>>> argument has now been allocated.
>>>>>>>>> [/quote]
>>>>>>>>> 
>>>>>>>>> It must to R0 in our case. But currently it goes to stack.
>>>>>>>>> I also checked this case with GCC - it works as in AAPCS written.
>>>>>>>>> 
>>>>>>>>> About refactoring. I propose do it in two stages, first rename
>>>>>>>>> "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and detach
>>>>>>>>> "StoreByValRegs" functionality from "VarArgStyleRegisters" method.
>>>>>>>>> 
>>>>>>>>> I attached first refactor patch, Calling-Convention patch with
>>>>>>>>> test-case, and calling-convention .c test case.
>>>>>>>>> Try run .c file with "arm-linux-gnueabihf-gcc" and
>>>>>>>>> "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>>>>>>> 
>>>>>>>>> P.S.: My beagleboard is very very slow, "release" build takes
>>>>>>>>> about 15 hours, test-suite may be 15-20 hours too. I'll try ask
>>>>>>>>> somebody to present me faster device, but it take some time.
>>>>>>>>> 
>>>>>>>>> -Stepan.
>>>>>>>>> 
>>>>>>>>> Manman Ren wrote:
>>>>>>>>>> 
>>>>>>>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>> 
>>>>>>>>>>> Hello Manman,
>>>>>>>>>>> 
>>>>>>>>>>> I reattached full patch with fixes.
>>>>>>>>>>> 
>>>>>>>>>>> 1. I've removedskipWastedByValRegs, addByValWastedRegs and
>>>>>>>>>>> Wasted flag.
>>>>>>>>>>> 2. Fixed usage of "offset" variable in LowerFormalArguments
>>>>>>>>>>> 3. And fixed regression test: added function with two byval
>>>>>>>>>>> structures.
>>>>>>>>>> The above looks good.
>>>>>>>>>>> 
>>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>>> 
>>>>>>>>>>> Currently, for align == 4 and parameter type <=i32, we try to
>>>>>>>>>>> assign it to the first unallocated register from [r0-r3] range,
>>>>>>>>>>> but only if getNextStackOffset() == 0.
>>>>>>>>>>> I suppose that "getNextStackOffset() != 0" is treated as fact?
>>>>>>>>>>> that all registers are used? and params goes to the stack now.
>>>>>>>>>>> After byval structures will be implemented, the next case is
>>>>>>>>>>> possible:
>>>>>>>>>>> 
>>>>>>>>>>> void @f(%i32_struct* byval, i32)
>>>>>>>>>>> 
>>>>>>>>>>> Here, we need to allocate stack for first parameter, but after
>>>>>>>>>>> we still have unallocated registers. And second parameter
>>>>>>>>>>> should be sent to r1. Current CC_ARM_AAPCS_Common
>>>>>>>>>>> implementation will send second parameter to the stack.
>>>>>>>>>>> Though, I'm still not sure about this change, just because may
>>>>>>>>>>> be there are more reasons to check nextStackOffset value.
>>>>>>>>>> Can we fix the above issue in a separate patch with additional
>>>>>>>>>> testing case?
>>>>>>>>>>> 
>>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>>> RE);
>>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>>> Yes. I've just moved this check from "if" statement, since it
>>>>>>>>>>>> must be
>>>>>>>>>>>> true always.
>>>>>>>>>>> How do you think, do we need this assertion?
>>>>>>>>>> I don't really think it is necessary :)
>>>>>>>>>>> 
>>>>>>>>>>> I didn't present refactor patch, just because main patch didn't
>>>>>>>>>>> get LGTM community and hypothetically may contain some mistakes..
>>>>>>>>>> LGTM otherwise.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Manman
>>>>>>>>>>> 
>>>>>>>>>>> -Stepan
>>>>>>>>>>> 
>>>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Stepan,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> One high-level question: do we need to insert the wasted
>>>>>>>>>>>>> record to
>>>>>>>>>>>>> ByValRegs and later on skip those records?
>>>>>>>>>>>>> It seems to me that we can only insert the records for the
>>>>>>>>>>>>> actual byval
>>>>>>>>>>>>> parameters.
>>>>>>>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>>>>>>>> skipWastedByValRegs.
>>>>>>>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +
>>>>>>>>>>>>>> +        // "offset" value would be usefull if byval
>>>>>>>>>>>>>> parameter outsides
>>>>>>>>>>>>>> +        // the registers area. Note, in that case
>>>>>>>>>>>>>> nextInRegsParam()
>>>>>>>>>>>>>> will not
>>>>>>>>>>>>>> +        // skip any waste registers, since parameter fillup
>>>>>>>>>>>>>> the whole
>>>>>>>>>>>>>> rest
>>>>>>>>>>>>>> +        // of registers set.
>>>>>>>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>>>>>>>> +        if (!CCInfo.nextInRegsParam() &&
>>>>>>>>>>>>>> Flags.getByValSize() >
>>>>>>>>>>>>>> 4*RegsUsed )
>>>>>>>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>>>>>>>> +      if (offset) {
>>>>>>>>>>>>> 
>>>>>>>>>>>>> This still does not look right.
>>>>>>>>>>>>> There are 3 cases:
>>>>>>>>>>>>> If the byval parameter is in registers partially, we should
>>>>>>>>>>>>> create a
>>>>>>>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>>>>>>>> If the byval parameter is all in registers, there is no need
>>>>>>>>>>>>> to create a
>>>>>>>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>>>>>>>> If the byval parameter is all on stack, we should create a
>>>>>>>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> When looking at the above code, offset is default to zero, if
>>>>>>>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is all on
>>>>>>>>>>>>> stack),
>>>>>>>>>>>>> offset will still be zero,
>>>>>>>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>>>>>>>> The correct logic seems to be always updating offset inside
>>>>>>>>>>>>> the if and
>>>>>>>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>>>>>>>>>>>>>   offset = RegEnd - RegBegin; //number of registers occupied
>>>>>>>>>>>>> by this
>>>>>>>>>>>>> byval parameter
>>>>>>>>>>>>>   CCInfo.nextInRegsParam(); // go to the next byval
>>>>>>>>>>>>> parameter that is
>>>>>>>>>>>>> in registers partially or all
>>>>>>>>>>>>> }
>>>>>>>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>>>>>>>   // Part of the byval parameter is on stack.
>>>>>>>>>>>>>   ...
>>>>>>>>>>>> Oh.. I missed third case. Then we keep first variant here.
>>>>>>>>>>>> 
>>>>>>>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>>> +    // All registers reserved for byval parameters should
>>>>>>>>>>>>>> be allocated.
>>>>>>>>>>>>>> +    // Check that first unallocated register is higher that
>>>>>>>>>>>>>> last one
>>>>>>>>>>>>>> used
>>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>>> RE);
>>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>>> Yes. I've just moved this check from "if" statement, since it
>>>>>>>>>>>> must be
>>>>>>>>>>>> true always.
>>>>>>>>>>>> 
>>>>>>>>>>>>>> -void
>>>>>>>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo,
>>>>>>>>>>>>>> SelectionDAG
>>>>>>>>>>>>>> &DAG,
>>>>>>>>>>>>>> -                                        DebugLoc dl,
>>>>>>>>>>>>>> SDValue &Chain,
>>>>>>>>>>>>>> -                                        const Value *OrigArg,
>>>>>>>>>>>>>> -                                        unsigned
>>>>>>>>>>>>>> OffsetFromOrigArg,
>>>>>>>>>>>>>> -                                        unsigned ArgOffset,
>>>>>>>>>>>>>> -                                        bool ForceMutable)
>>>>>>>>>>>>>> const {
>>>>>>>>>>>>>> +// Return: The frame index registers were stored into.
>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo,
>>>>>>>>>>>>>> SelectionDAG &DAG,
>>>>>>>>>>>>>> +                                  DebugLoc dl, SDValue
>>>>>>>>>>>>>> &Chain,
>>>>>>>>>>>>>> +                                  const Value *OrigArg,
>>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>>> InRegsParamRecordIdx,
>>>>>>>>>>>>>> +                                  unsigned OffsetFromOrigArg,
>>>>>>>>>>>>>> +                                  unsigned ArgOffset,
>>>>>>>>>>>>>> +                                  bool ForceMutable) const {
>>>>>>>>>>>>> 
>>>>>>>>>>>>> It would be much cleaner if we can move the refactoring of
>>>>>>>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>>>>>>>> For sure.
>>>>>>>>>>>> 
>>>>>>>>>>>> P.S.: Today I had some troubles with electricity, so I'll try
>>>>>>>>>>>> present
>>>>>>>>>>>> new patch tomorrow.
>>>>>>>>>>>> 
>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>> 
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>>> 
>>>>>>>>>>> <pr15293-2013-04-12.patch>
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> <AAPCS-C.4-vs-VFP-2013-04-17.patch>
>>>>>> 
>>>>> 
>>>>> <AAPCS-C.4-vs-VFP-2013-04-19.patch>
>>>> 
>>> 
>> 
> 
> <StoreByValRegs-2013-04-17.patch>





More information about the llvm-commits mailing list