Fix PR15293: ARM codegen ice - expected larger existing stack allocation
Manman Ren
mren at apple.com
Fri Apr 19 10:45:43 PDT 2013
LGTM,
Thanks a lot,
Manman
On Apr 18, 2013, at 8:41 PM, Stepan Dyatkovskiy wrote:
> Fixed :-) New patch (+2 tests) is attached.
> -Stepan
>
> Manman Ren wrote:
>>
>> "AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.
>>
>> As a side note, we may have issues with 9 floats, 3 ints and a small byval.
>> The standard says we should split the byval only when NSAA is equal to SP (C.5), in the case of 9 floats, 3 ints and a small byval,
>> we should not split the byval since we advanced NSAA with the 9th float.
>>
>> Thanks,
>> Manman
>>
>> On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:
>>
>>> Hi Manman,
>>>
>>>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that the testing case.
>>>> Should it have 9 doubles and 1 integer instead of a single double and a single integer?
>>>> The testing case passes even without the fix.
>>> That was my midnight mistake, sorry.
>>> I reattached patch with proper test. In the end of test file, I've added comments for @doFoo: small description of how parameters must be passed:
>>> first eight doubles --> D0-D7
>>> 9th double --> Stack
>>> 10th param (first i32) --> R0.
>>>
>>> It is important to have 9 double params here, since we must fill-up all possible VFPs, then get compiler to use stack offset, and only after start use GPRs.
>>>
>>>>
>>>> I am not sure about the renaming patch though. Functions get|setVarArgsRegSaveSize are used for both vararg and byval.
>>>> It does not sound right to replace VarARgs with ByVal.
>>>
>>> Actually it is used not for vararg function, as I found it main role of this param in emitPrologue, it is used for proper stack allocation.
>>> Stack size should enough for storing all registers, used for byval parameters. And yes, if it is VA function, we must also extend this stack area for all unallocated R0-R3 regs storage.
>> Agree. That is why I think ByVal is not a great name either.
>> Since it represents the stack area needed to save parameters that are actually passed in registers, how about RegsSaveSizeForParams or just RegsSaveSize?
>> Also it is weird to have "unsigned VARegSaveSize = AFI->getByValRegsSaveSize();" where we assign return value of getByValRegsSaveSize to a variable "VARegSaveSize".
>>
>> Thanks for working on this.
>>>
>>> -Stepan.
>>>
>>>> Thanks for working on this.
>>>> Manman
>>>>
>>>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>>>
>>>>> Hi Manman,
>>>>>
>>>>>>> *4.* One new change I've made is
>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>> Can we fix the above issue in a separate patch with additional testing case?
>>>>>
>>>>> Well.. We can use next case: Co-Processor register candidates may be
>>>>> either in VFP or in stack, so when all VFP are allocated stack is used. We can use stack without GPR allocation in that case, passing 9 f64 params, for example: first eight goes to d0-d7, night goes to stack. Now as 10th parameter we pass i32, according to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>>>>
>>>>> [quote]
>>>>> If the size in words of the argument is not more than r4 minus NCRN, the argument is copied into
>>>>> core registers, starting at the NCRN. The NCRN is incremented by the number of registers used.
>>>>> Successive registers hold the parts of the argument they would hold if its value were loaded into
>>>>> those registers from memory using an LDM instruction. The argument has now been allocated.
>>>>> [/quote]
>>>>>
>>>>> It must to R0 in our case. But currently it goes to stack.
>>>>> I also checked this case with GCC - it works as in AAPCS written.
>>>>>
>>>>> About refactoring. I propose do it in two stages, first rename "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and detach "StoreByValRegs" functionality from "VarArgStyleRegisters" method.
>>>>>
>>>>> I attached first refactor patch, Calling-Convention patch with test-case, and calling-convention .c test case.
>>>>> Try run .c file with "arm-linux-gnueabihf-gcc" and "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>>>
>>>>> P.S.: My beagleboard is very very slow, "release" build takes about 15 hours, test-suite may be 15-20 hours too. I'll try ask somebody to present me faster device, but it take some time.
>>>>>
>>>>> -Stepan.
>>>>>
>>>>> Manman Ren wrote:
>>>>>>
>>>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>>>
>>>>>>> Hello Manman,
>>>>>>>
>>>>>>> I reattached full patch with fixes.
>>>>>>>
>>>>>>> 1. I've removedskipWastedByValRegs, addByValWastedRegs and Wasted flag.
>>>>>>> 2. Fixed usage of "offset" variable in LowerFormalArguments
>>>>>>> 3. And fixed regression test: added function with two byval structures.
>>>>>> The above looks good.
>>>>>>>
>>>>>>> *4.* One new change I've made is
>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>
>>>>>>> Currently, for align == 4 and parameter type <=i32, we try to assign it to the first unallocated register from [r0-r3] range, but only if getNextStackOffset() == 0.
>>>>>>> I suppose that "getNextStackOffset() != 0" is treated as fact? that all registers are used? and params goes to the stack now. After byval structures will be implemented, the next case is possible:
>>>>>>>
>>>>>>> void @f(%i32_struct* byval, i32)
>>>>>>>
>>>>>>> Here, we need to allocate stack for first parameter, but after we still have unallocated registers. And second parameter should be sent to r1. Current CC_ARM_AAPCS_Common implementation will send second parameter to the stack.
>>>>>>> Though, I'm still not sure about this change, just because may be there are more reasons to check nextStackOffset value.
>>>>>> Can we fix the above issue in a separate patch with additional testing case?
>>>>>>>
>>>>>>>>>> + // for byval parameters.
>>>>>>>>>> + {
>>>>>>>>>> + unsigned InRegsParamsCount = State->getInRegsParamsCount();
>>>>>>>>>> + if (InRegsParamsCount) {
>>>>>>>>>> + unsigned RB, RE;
>>>>>>>>>> + State->getInRegsParamInfo(InRegsParamsCount-1, RB, RE);
>>>>>>>>>> + assert(reg >= RE);
>>>>>>>>>> + }
>>>>>>>>>> + }
>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>> Yes. I've just moved this check from "if" statement, since it must be
>>>>>>>> true always.
>>>>>>> How do you think, do we need this assertion?
>>>>>> I don't really think it is necessary :)
>>>>>>>
>>>>>>> I didn't present refactor patch, just because main patch didn't get LGTM community and hypothetically may contain some mistakes..
>>>>>> LGTM otherwise.
>>>>>>
>>>>>> Thanks,
>>>>>> Manman
>>>>>>>
>>>>>>> -Stepan
>>>>>>>
>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>> Manman Ren wrote:
>>>>>>>>>
>>>>>>>>> Hi Stepan,
>>>>>>>>>
>>>>>>>>> One high-level question: do we need to insert the wasted record to
>>>>>>>>> ByValRegs and later on skip those records?
>>>>>>>>> It seems to me that we can only insert the records for the actual byval
>>>>>>>>> parameters.
>>>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>>>> skipWastedByValRegs.
>>>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> + // "offset" value would be usefull if byval parameter outsides
>>>>>>>>>> + // the registers area. Note, in that case nextInRegsParam()
>>>>>>>>>> will not
>>>>>>>>>> + // skip any waste registers, since parameter fillup the whole
>>>>>>>>>> rest
>>>>>>>>>> + // of registers set.
>>>>>>>>>> + unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>>>> + if (!CCInfo.nextInRegsParam() && Flags.getByValSize() >
>>>>>>>>>> 4*RegsUsed )
>>>>>>>>>> + offset = RegEnd - RegBegin;
>>>>>>>>>> }
>>>>>>>>>>
>>>>>>>>>> - if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>>>> + if (offset) {
>>>>>>>>>
>>>>>>>>> This still does not look right.
>>>>>>>>> There are 3 cases:
>>>>>>>>> If the byval parameter is in registers partially, we should create a
>>>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>>>> If the byval parameter is all in registers, there is no need to create a
>>>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>>>> If the byval parameter is all on stack, we should create a
>>>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>>>
>>>>>>>>> When looking at the above code, offset is default to zero, if
>>>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is all on stack),
>>>>>>>>> offset will still be zero,
>>>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>>>> The correct logic seems to be always updating offset inside the if and
>>>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>> …
>>>>>>>>> offset = RegEnd - RegBegin; //number of registers occupied by this
>>>>>>>>> byval parameter
>>>>>>>>> CCInfo.nextInRegsParam(); // go to the next byval parameter that is
>>>>>>>>> in registers partially or all
>>>>>>>>> }
>>>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>>> // Part of the byval parameter is on stack.
>>>>>>>>> ...
>>>>>>>> Oh.. I missed third case. Then we keep first variant here.
>>>>>>>>
>>>>>>>>>> - if ((!State->isFirstByValRegValid()) &&
>>>>>>>>>> - (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>> + if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>> + // All registers reserved for byval parameters should be allocated.
>>>>>>>>>> + // Check that first unallocated register is higher that last one
>>>>>>>>>> used
>>>>>>>>>> + // for byval parameters.
>>>>>>>>>> + {
>>>>>>>>>> + unsigned InRegsParamsCount = State->getInRegsParamsCount();
>>>>>>>>>> + if (InRegsParamsCount) {
>>>>>>>>>> + unsigned RB, RE;
>>>>>>>>>> + State->getInRegsParamInfo(InRegsParamsCount-1, RB, RE);
>>>>>>>>>> + assert(reg >= RE);
>>>>>>>>>> + }
>>>>>>>>>> + }
>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>> Yes. I've just moved this check from "if" statement, since it must be
>>>>>>>> true always.
>>>>>>>>
>>>>>>>>>> -void
>>>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo, SelectionDAG
>>>>>>>>>> &DAG,
>>>>>>>>>> - DebugLoc dl, SDValue &Chain,
>>>>>>>>>> - const Value *OrigArg,
>>>>>>>>>> - unsigned OffsetFromOrigArg,
>>>>>>>>>> - unsigned ArgOffset,
>>>>>>>>>> - bool ForceMutable) const {
>>>>>>>>>> +// Return: The frame index registers were stored into.
>>>>>>>>>> +int
>>>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo, SelectionDAG &DAG,
>>>>>>>>>> + DebugLoc dl, SDValue &Chain,
>>>>>>>>>> + const Value *OrigArg,
>>>>>>>>>> + unsigned InRegsParamRecordIdx,
>>>>>>>>>> + unsigned OffsetFromOrigArg,
>>>>>>>>>> + unsigned ArgOffset,
>>>>>>>>>> + bool ForceMutable) const {
>>>>>>>>>
>>>>>>>>> It would be much cleaner if we can move the refactoring of
>>>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>>>> For sure.
>>>>>>>>
>>>>>>>> P.S.: Today I had some troubles with electricity, so I'll try present
>>>>>>>> new patch tomorrow.
>>>>>>>>
>>>>>>>> -Stepan.
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>> <pr15293-2013-04-12.patch>
>>>>>>
>>>>>
>>>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>>
>>>
>>> <AAPCS-C.4-vs-VFP-2013-04-17.patch>
>>
>
> <AAPCS-C.4-vs-VFP-2013-04-19.patch>
More information about the llvm-commits
mailing list