Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Stepan Dyatkovskiy stpworld at narod.ru
Sat Apr 27 23:27:02 PDT 2013


Hi Manman,

 > Since it represents the stack area needed to save parameters that are 
actually passed in registers, how about RegsSaveSizeForParams or just 
RegsSaveSize?

In AAPCS R0-R3 are called like:
"Argument / result / scratch register".
So maybe we'll use "set/getArgRegsSaveSize" names?

 > Also it is weird to have "unsigned VARegSaveSize = 
AFI->getByValRegsSaveSize();"

Yea :-) Here replace "VAReg" with "ArgRegs": ArgRegsSaveSize, ArgRegsSize.

-Stepan

Manman Ren wrote:
>
> Hi Stepan,
>
> Thanks for working on this.
>
> I put in a comment about the refactor patch sometime ago:
>>> I am not sure about the renaming patch though. Functions get|setVarArgsRegSaveSize are used for both vararg and byval.
>>> It does not sound right to replace VarARgs with ByVal.
>>
>> Actually it is used not for vararg function, as I found it main role of this param in emitPrologue, it is used for proper stack allocation.
>> Stack size should enough for storing all registers, used for byval parameters. And yes, if it is VA function, we must also extend this stack area for all unallocated R0-R3 regs storage.
>
> Agree. That is why I think ByVal is not a great name either.
> Since it represents the stack area needed to save parameters that are actually passed in registers, how about RegsSaveSizeForParams or just RegsSaveSize?
> Also it is weird to have "unsigned VARegSaveSize = AFI->getByValRegsSaveSize();" where we assign return value of getByValRegsSaveSize to a variable "VARegSaveSize".
>
> Manman
>
> On Apr 26, 2013, at 11:12 AM, Stepan Dyatkovskiy wrote:
>
>> Hi Manman and Renato,
>> Patch was tested with test-suite all test have bee passed.
>> -Stepan
>>
>> Stepan Dyatkovskiy wrote:
>>> Hi all,
>>>
>>> AAPCS, 5.5 Parameters Passing, Stage C, C4 and C5 have been fixed, r180011.
>>>
>>> So, next patch is in attachment: VarArgsStyleRegisters refactoring.
>>>
>>> -Stepan.
>>>
>>> Stepan Dyatkovskiy wrote:
>>>> Hello, Manman.
>>>> That's kind of snowball!
>>>> Today test-suite finished with two new failures on my beagle.
>>>> It helped me find out one more hidden logic dependency in ARM backend.
>>>>
>>>> May be its not a mistake, but look:
>>>>
>>>> In ARMCallingConv.h, "f64AssignAAPCS" allocates registers for f64. This
>>>> method is called for VA functions, since there is no CPRCs, and *any*
>>>> parameters could be allocated in R0-R3. Either R0,R1 or R2,R3 could be
>>>> allocated here. If all the registers except R3 were allocated somewhere
>>>> before, f64AssignAAPCS just moves whole f64 to stack, it doesn't waste
>>>> R3, so R3 could be allocated later. But logic seems to be same as in
>>>> ARMTargetLowering::HandleByVal.
>>>>
>>>> My main fix in calling convention was:
>>>>
>>>> def CC_ARM_AAPCS_Common : CallingConv<[
>>>> ...
>>>> -  CCIfType<[i32], CCIf<"State.getNextStackOffset() == 0 &&"
>>>> -                       "ArgFlags.getOrigAlign() != 8",
>>>> +  CCIfType<[i32], CCIf<"ArgFlags.getOrigAlign() != 8",
>>>>                          CCAssignToReg<[R0, R1, R2, R3]>>>,
>>>>
>>>> Now I got some picture why "State.getNextStackOffset() == 0" was added:
>>>> due to f64AssignAAPCS implementation. May be the are more reasons: next
>>>> test-suite results will show it then :-)
>>>>
>>>> Anyway, original CCIfType brakes C.4 and C.5, while my fix brakes some
>>>> implicit statement defined in f64AssignAAPCS:
>>>>
>>>> "stack-offset != 0 doesn't mean all registers are allocated"
>>>>
>>>> So I've fixed f64AssignAAPCS too. Once we waste registers in HandleByVal
>>>> (when we send parameter to stack), I think it would be good to keep same
>>>> behaviour for f64AssignAAPCS since its HandleByVal's analogue for f64
>>>> params.
>>>>
>>>> I attached new patch, now with 3 tests :-)
>>>> (3rd test is 2013-04-21-AAPCS-VA-C.1.cp.ll)
>>>>
>>>> -Stepan.
>>>>
>>>> Manman Ren wrote:
>>>>>
>>>>> LGTM,
>>>>>
>>>>> Thanks a lot,
>>>>> Manman
>>>>>
>>>>> On Apr 18, 2013, at 8:41 PM, Stepan Dyatkovskiy wrote:
>>>>>
>>>>>> Fixed :-) New patch (+2 tests) is attached.
>>>>>> -Stepan
>>>>>>
>>>>>> Manman Ren wrote:
>>>>>>>
>>>>>>> "AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.
>>>>>>>
>>>>>>> As a side note, we may have issues with 9 floats, 3 ints and a small
>>>>>>> byval.
>>>>>>> The standard says we should split the byval only when NSAA is equal
>>>>>>> to SP (C.5), in the case of 9 floats, 3 ints and a small byval,
>>>>>>> we should not split the byval since we advanced NSAA with the 9th
>>>>>>> float.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>
>>>>>>> On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:
>>>>>>>
>>>>>>>> Hi Manman,
>>>>>>>>
>>>>>>>>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that the
>>>>>>>>> testing case.
>>>>>>>>> Should it have 9 doubles and 1 integer instead of a single double
>>>>>>>>> and a single integer?
>>>>>>>>> The testing case passes even without the fix.
>>>>>>>> That was my midnight mistake, sorry.
>>>>>>>> I reattached patch with proper test. In the end of test file, I've
>>>>>>>> added comments for @doFoo: small description of how parameters must
>>>>>>>> be passed:
>>>>>>>> first eight doubles --> D0-D7
>>>>>>>> 9th double --> Stack
>>>>>>>> 10th param (first i32) --> R0.
>>>>>>>>
>>>>>>>> It is important to have 9 double params here, since we must fill-up
>>>>>>>> all possible VFPs, then get compiler to use stack offset, and only
>>>>>>>> after start use GPRs.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> I am not sure about the renaming patch though. Functions
>>>>>>>>> get|setVarArgsRegSaveSize are used for both vararg and byval.
>>>>>>>>> It does not sound right to replace VarARgs with ByVal.
>>>>>>>>
>>>>>>>> Actually it is used not for vararg function, as I found it main
>>>>>>>> role of this param in emitPrologue, it is used for proper stack
>>>>>>>> allocation.
>>>>>>>> Stack size should enough for storing all registers, used for byval
>>>>>>>> parameters. And yes, if it is VA function, we must also extend this
>>>>>>>> stack area for all unallocated R0-R3 regs storage.
>>>>>>> Agree. That is why I think ByVal is not a great name either.
>>>>>>> Since it represents the stack area needed to save parameters that
>>>>>>> are actually passed in registers, how about RegsSaveSizeForParams or
>>>>>>> just RegsSaveSize?
>>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>>> AFI->getByValRegsSaveSize();" where we assign return value of
>>>>>>> getByValRegsSaveSize to a variable "VARegSaveSize".
>>>>>>>
>>>>>>> Thanks for working on this.
>>>>>>>>
>>>>>>>> -Stepan.
>>>>>>>>
>>>>>>>>> Thanks for working on this.
>>>>>>>>> Manman
>>>>>>>>>
>>>>>>>>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>
>>>>>>>>>> Hi Manman,
>>>>>>>>>>
>>>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>>> Can we fix the above issue in a separate patch with additional
>>>>>>>>>>> testing case?
>>>>>>>>>>
>>>>>>>>>> Well.. We can use next case: Co-Processor register candidates
>>>>>>>>>> may be
>>>>>>>>>> either in VFP or in stack, so when all VFP are allocated stack is
>>>>>>>>>> used. We can use stack without GPR allocation in that case,
>>>>>>>>>> passing 9 f64 params, for example: first eight goes to d0-d7,
>>>>>>>>>> night goes to stack. Now as 10th parameter we pass i32, according
>>>>>>>>>> to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>>>>>>>>>
>>>>>>>>>> [quote]
>>>>>>>>>> If the size in words of the argument is not more than r4 minus
>>>>>>>>>> NCRN, the argument is copied into
>>>>>>>>>> core registers, starting at the NCRN. The NCRN is incremented by
>>>>>>>>>> the number of registers used.
>>>>>>>>>> Successive registers hold the parts of the argument they would
>>>>>>>>>> hold if its value were loaded into
>>>>>>>>>> those registers from memory using an LDM instruction. The
>>>>>>>>>> argument has now been allocated.
>>>>>>>>>> [/quote]
>>>>>>>>>>
>>>>>>>>>> It must to R0 in our case. But currently it goes to stack.
>>>>>>>>>> I also checked this case with GCC - it works as in AAPCS written.
>>>>>>>>>>
>>>>>>>>>> About refactoring. I propose do it in two stages, first rename
>>>>>>>>>> "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and detach
>>>>>>>>>> "StoreByValRegs" functionality from "VarArgStyleRegisters" method.
>>>>>>>>>>
>>>>>>>>>> I attached first refactor patch, Calling-Convention patch with
>>>>>>>>>> test-case, and calling-convention .c test case.
>>>>>>>>>> Try run .c file with "arm-linux-gnueabihf-gcc" and
>>>>>>>>>> "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>>>>>>>>
>>>>>>>>>> P.S.: My beagleboard is very very slow, "release" build takes
>>>>>>>>>> about 15 hours, test-suite may be 15-20 hours too. I'll try ask
>>>>>>>>>> somebody to present me faster device, but it take some time.
>>>>>>>>>>
>>>>>>>>>> -Stepan.
>>>>>>>>>>
>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>
>>>>>>>>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hello Manman,
>>>>>>>>>>>>
>>>>>>>>>>>> I reattached full patch with fixes.
>>>>>>>>>>>>
>>>>>>>>>>>> 1. I've removedskipWastedByValRegs, addByValWastedRegs and
>>>>>>>>>>>> Wasted flag.
>>>>>>>>>>>> 2. Fixed usage of "offset" variable in LowerFormalArguments
>>>>>>>>>>>> 3. And fixed regression test: added function with two byval
>>>>>>>>>>>> structures.
>>>>>>>>>>> The above looks good.
>>>>>>>>>>>>
>>>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>>>>
>>>>>>>>>>>> Currently, for align == 4 and parameter type <=i32, we try to
>>>>>>>>>>>> assign it to the first unallocated register from [r0-r3] range,
>>>>>>>>>>>> but only if getNextStackOffset() == 0.
>>>>>>>>>>>> I suppose that "getNextStackOffset() != 0" is treated as fact?
>>>>>>>>>>>> that all registers are used? and params goes to the stack now.
>>>>>>>>>>>> After byval structures will be implemented, the next case is
>>>>>>>>>>>> possible:
>>>>>>>>>>>>
>>>>>>>>>>>> void @f(%i32_struct* byval, i32)
>>>>>>>>>>>>
>>>>>>>>>>>> Here, we need to allocate stack for first parameter, but after
>>>>>>>>>>>> we still have unallocated registers. And second parameter
>>>>>>>>>>>> should be sent to r1. Current CC_ARM_AAPCS_Common
>>>>>>>>>>>> implementation will send second parameter to the stack.
>>>>>>>>>>>> Though, I'm still not sure about this change, just because may
>>>>>>>>>>>> be there are more reasons to check nextStackOffset value.
>>>>>>>>>>> Can we fix the above issue in a separate patch with additional
>>>>>>>>>>> testing case?
>>>>>>>>>>>>
>>>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>>>> RE);
>>>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>>>> Yes. I've just moved this check from "if" statement, since it
>>>>>>>>>>>>> must be
>>>>>>>>>>>>> true always.
>>>>>>>>>>>> How do you think, do we need this assertion?
>>>>>>>>>>> I don't really think it is necessary :)
>>>>>>>>>>>>
>>>>>>>>>>>> I didn't present refactor patch, just because main patch didn't
>>>>>>>>>>>> get LGTM community and hypothetically may contain some mistakes..
>>>>>>>>>>> LGTM otherwise.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Manman
>>>>>>>>>>>>
>>>>>>>>>>>> -Stepan
>>>>>>>>>>>>
>>>>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Stepan,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> One high-level question: do we need to insert the wasted
>>>>>>>>>>>>>> record to
>>>>>>>>>>>>>> ByValRegs and later on skip those records?
>>>>>>>>>>>>>> It seems to me that we can only insert the records for the
>>>>>>>>>>>>>> actual byval
>>>>>>>>>>>>>> parameters.
>>>>>>>>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>>>>>>>>> skipWastedByValRegs.
>>>>>>>>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>> +        // "offset" value would be usefull if byval
>>>>>>>>>>>>>>> parameter outsides
>>>>>>>>>>>>>>> +        // the registers area. Note, in that case
>>>>>>>>>>>>>>> nextInRegsParam()
>>>>>>>>>>>>>>> will not
>>>>>>>>>>>>>>> +        // skip any waste registers, since parameter fillup
>>>>>>>>>>>>>>> the whole
>>>>>>>>>>>>>>> rest
>>>>>>>>>>>>>>> +        // of registers set.
>>>>>>>>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>>>>>>>>> +        if (!CCInfo.nextInRegsParam() &&
>>>>>>>>>>>>>>> Flags.getByValSize() >
>>>>>>>>>>>>>>> 4*RegsUsed )
>>>>>>>>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>>>>>>>>> +      if (offset) {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This still does not look right.
>>>>>>>>>>>>>> There are 3 cases:
>>>>>>>>>>>>>> If the byval parameter is in registers partially, we should
>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>>>>>>>>> If the byval parameter is all in registers, there is no need
>>>>>>>>>>>>>> to create a
>>>>>>>>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>>>>>>>>> If the byval parameter is all on stack, we should create a
>>>>>>>>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> When looking at the above code, offset is default to zero, if
>>>>>>>>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is all on
>>>>>>>>>>>>>> stack),
>>>>>>>>>>>>>> offset will still be zero,
>>>>>>>>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>>>>>>>>> The correct logic seems to be always updating offset inside
>>>>>>>>>>>>>> the if and
>>>>>>>>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>>>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>    offset = RegEnd - RegBegin; //number of registers occupied
>>>>>>>>>>>>>> by this
>>>>>>>>>>>>>> byval parameter
>>>>>>>>>>>>>>    CCInfo.nextInRegsParam(); // go to the next byval
>>>>>>>>>>>>>> parameter that is
>>>>>>>>>>>>>> in registers partially or all
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>>>>>>>>    // Part of the byval parameter is on stack.
>>>>>>>>>>>>>>    ...
>>>>>>>>>>>>> Oh.. I missed third case. Then we keep first variant here.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>>>> +    // All registers reserved for byval parameters should
>>>>>>>>>>>>>>> be allocated.
>>>>>>>>>>>>>>> +    // Check that first unallocated register is higher that
>>>>>>>>>>>>>>> last one
>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>>>> RE);
>>>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>>>> Yes. I've just moved this check from "if" statement, since it
>>>>>>>>>>>>> must be
>>>>>>>>>>>>> true always.
>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -void
>>>>>>>>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo,
>>>>>>>>>>>>>>> SelectionDAG
>>>>>>>>>>>>>>> &DAG,
>>>>>>>>>>>>>>> -                                        DebugLoc dl,
>>>>>>>>>>>>>>> SDValue &Chain,
>>>>>>>>>>>>>>> -                                        const Value *OrigArg,
>>>>>>>>>>>>>>> -                                        unsigned
>>>>>>>>>>>>>>> OffsetFromOrigArg,
>>>>>>>>>>>>>>> -                                        unsigned ArgOffset,
>>>>>>>>>>>>>>> -                                        bool ForceMutable)
>>>>>>>>>>>>>>> const {
>>>>>>>>>>>>>>> +// Return: The frame index registers were stored into.
>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo,
>>>>>>>>>>>>>>> SelectionDAG &DAG,
>>>>>>>>>>>>>>> +                                  DebugLoc dl, SDValue
>>>>>>>>>>>>>>> &Chain,
>>>>>>>>>>>>>>> +                                  const Value *OrigArg,
>>>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>>>> InRegsParamRecordIdx,
>>>>>>>>>>>>>>> +                                  unsigned OffsetFromOrigArg,
>>>>>>>>>>>>>>> +                                  unsigned ArgOffset,
>>>>>>>>>>>>>>> +                                  bool ForceMutable) const {
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It would be much cleaner if we can move the refactoring of
>>>>>>>>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>>>>>>>>> For sure.
>>>>>>>>>>>>>
>>>>>>>>>>>>> P.S.: Today I had some troubles with electricity, so I'll try
>>>>>>>>>>>>> present
>>>>>>>>>>>>> new patch tomorrow.
>>>>>>>>>>>>>
>>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>>
>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>>>>
>>>>>>>>>>>> <pr15293-2013-04-12.patch>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> <AAPCS-C.4-vs-VFP-2013-04-17.patch>
>>>>>>>
>>>>>>
>>>>>> <AAPCS-C.4-vs-VFP-2013-04-19.patch>
>>>>>
>>>>
>>>
>>
>> <StoreByValRegs-2013-04-17.patch>
>




More information about the llvm-commits mailing list