Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Stepan Dyatkovskiy stpworld at narod.ru
Wed Apr 17 05:24:34 PDT 2013


Hi Manman,

I've combined set|getVarArgsRegSize renaming and VarArgStyleRegisters 
refactoring. May be it'll help to make my ideas more clean.

Here:
* Fixed AAPCS-C.4-vs-VFP patch.
* VarArgStyleRegisters refactoring patch + rename set/getVarArgsRegSize.

-Stepan.

Stepan Dyatkovskiy wrote:
> Hi Manman,
>
>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that the testing
>> case.
>> Should it have 9 doubles and 1 integer instead of a single double and
>> a single integer?
>> The testing case passes even without the fix.
> That was my midnight mistake, sorry.
> I reattached patch with proper test. In the end of test file, I've added
> comments for @doFoo: small description of how parameters must be passed:
> first eight doubles --> D0-D7
> 9th double --> Stack
> 10th param (first i32) --> R0.
>
> It is important to have 9 double params here, since we must fill-up all
> possible VFPs, then get compiler to use stack offset, and only after
> start use GPRs.
>
>>
>> I am not sure about the renaming patch though. Functions
>> get|setVarArgsRegSaveSize are used for both vararg and byval.
>> It does not sound right to replace VarARgs with ByVal.
>
> Actually it is used not for vararg function, as I found it main role of
> this param in emitPrologue, it is used for proper stack allocation.
> Stack size should enough for storing all registers, used for byval
> parameters. And yes, if it is VA function, we must also extend this
> stack area for all unallocated R0-R3 regs storage.
>
> -Stepan.
>
>> Thanks for working on this.
>> Manman
>>
>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>
>>> Hi Manman,
>>>
>>>>> *4.* One new change I've made is
>>>>> CC_ARM_AAPCS_Common rule.
>>>> Can we fix the above issue in a separate patch with additional
>>>> testing case?
>>>
>>> Well.. We can use next case: Co-Processor register candidates may be
>>> either in VFP or in stack, so when all VFP are allocated stack is
>>> used. We can use stack without GPR allocation in that case, passing 9
>>> f64 params, for example: first eight goes to d0-d7, night goes to
>>> stack. Now as 10th parameter we pass i32, according to AAPCS, 5.5
>>> Parameters Passing, Stage C, C.4:
>>>
>>> [quote]
>>> If the size in words of the argument is not more than r4 minus NCRN,
>>> the argument is copied into
>>> core registers, starting at the NCRN. The NCRN is incremented by the
>>> number of registers used.
>>> Successive registers hold the parts of the argument they would hold
>>> if its value were loaded into
>>> those registers from memory using an LDM instruction. The argument
>>> has now been allocated.
>>> [/quote]
>>>
>>> It must to R0 in our case. But currently it goes to stack.
>>> I also checked this case with GCC - it works as in AAPCS written.
>>>
>>> About refactoring. I propose do it in two stages, first rename
>>> "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and detach
>>> "StoreByValRegs" functionality from "VarArgStyleRegisters" method.
>>>
>>> I attached first refactor patch, Calling-Convention patch with
>>> test-case, and calling-convention .c test case.
>>> Try run .c file with "arm-linux-gnueabihf-gcc" and
>>> "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>
>>> P.S.: My beagleboard is very very slow, "release" build takes about
>>> 15 hours, test-suite may be 15-20 hours too. I'll try ask somebody to
>>> present me faster device, but it take some time.
>>>
>>> -Stepan.
>>>
>>> Manman Ren wrote:
>>>>
>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>
>>>>> Hello Manman,
>>>>>
>>>>> I reattached full patch with fixes.
>>>>>
>>>>> 1. I've removedskipWastedByValRegs, addByValWastedRegs and Wasted
>>>>> flag.
>>>>> 2. Fixed usage of "offset" variable in LowerFormalArguments
>>>>> 3. And fixed regression test: added function with two byval
>>>>> structures.
>>>> The above looks good.
>>>>>
>>>>> *4.* One new change I've made is
>>>>> CC_ARM_AAPCS_Common rule.
>>>>>
>>>>> Currently, for align == 4 and parameter type <=i32, we try to
>>>>> assign it to the first unallocated register from [r0-r3] range, but
>>>>> only if getNextStackOffset() == 0.
>>>>> I suppose that "getNextStackOffset() != 0" is treated as fact? that
>>>>> all registers are used? and params goes to the stack now. After
>>>>> byval structures will be implemented, the next case is possible:
>>>>>
>>>>> void @f(%i32_struct* byval, i32)
>>>>>
>>>>> Here, we need to allocate stack for first parameter, but after we
>>>>> still have unallocated registers. And second parameter should be
>>>>> sent to r1. Current CC_ARM_AAPCS_Common implementation will send
>>>>> second parameter to the stack.
>>>>> Though, I'm still not sure about this change, just because may be
>>>>> there are more reasons to check nextStackOffset value.
>>>> Can we fix the above issue in a separate patch with additional
>>>> testing case?
>>>>>
>>>>>>>> +    // for byval parameters.
>>>>>>>> +    {
>>>>>>>> +      unsigned InRegsParamsCount = State->getInRegsParamsCount();
>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>> +        unsigned RB, RE;
>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB, RE);
>>>>>>>> +        assert(reg >= RE);
>>>>>>>> +      }
>>>>>>>> +    }
>>>>>>> Is this only for assertion? Is this necessary?
>>>>>> Yes. I've just moved this check from "if" statement, since it must be
>>>>>> true always.
>>>>> How do you think, do we need this assertion?
>>>> I don't really think it is necessary :)
>>>>>
>>>>> I didn't present refactor patch, just because main patch didn't get
>>>>> LGTM community and hypothetically may contain some mistakes..
>>>> LGTM otherwise.
>>>>
>>>> Thanks,
>>>> Manman
>>>>>
>>>>> -Stepan
>>>>>
>>>>> Stepan Dyatkovskiy wrote:
>>>>>> Manman Ren wrote:
>>>>>>>
>>>>>>> Hi Stepan,
>>>>>>>
>>>>>>> One high-level question: do we need to insert the wasted record to
>>>>>>> ByValRegs and later on skip those records?
>>>>>>> It seems to me that we can only insert the records for the actual
>>>>>>> byval
>>>>>>> parameters.
>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>> skipWastedByValRegs.
>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>
>>>>>>>
>>>>>>>> +
>>>>>>>> +        // "offset" value would be usefull if byval parameter
>>>>>>>> outsides
>>>>>>>> +        // the registers area. Note, in that case
>>>>>>>> nextInRegsParam()
>>>>>>>> will not
>>>>>>>> +        // skip any waste registers, since parameter fillup the
>>>>>>>> whole
>>>>>>>> rest
>>>>>>>> +        // of registers set.
>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>> +        if (!CCInfo.nextInRegsParam() && Flags.getByValSize() >
>>>>>>>> 4*RegsUsed )
>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>        }
>>>>>>>>
>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>> +      if (offset) {
>>>>>>>
>>>>>>> This still does not look right.
>>>>>>> There are 3 cases:
>>>>>>> If the byval parameter is in registers partially, we should create a
>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>> If the byval parameter is all in registers, there is no need to
>>>>>>> create a
>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>> If the byval parameter is all on stack, we should create a
>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>
>>>>>>> When looking at the above code, offset is default to zero, if
>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is all on
>>>>>>> stack),
>>>>>>> offset will still be zero,
>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>> The correct logic seems to be always updating offset inside the
>>>>>>> if and
>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>    offset = RegEnd - RegBegin; //number of registers occupied by
>>>>>>> this
>>>>>>> byval parameter
>>>>>>>    CCInfo.nextInRegsParam(); // go to the next byval parameter
>>>>>>> that is
>>>>>>> in registers partially or all
>>>>>>> }
>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>    // Part of the byval parameter is on stack.
>>>>>>>    ...
>>>>>> Oh.. I missed third case. Then we keep first variant here.
>>>>>>
>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>> +    // All registers reserved for byval parameters should be
>>>>>>>> allocated.
>>>>>>>> +    // Check that first unallocated register is higher that
>>>>>>>> last one
>>>>>>>> used
>>>>>>>> +    // for byval parameters.
>>>>>>>> +    {
>>>>>>>> +      unsigned InRegsParamsCount = State->getInRegsParamsCount();
>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>> +        unsigned RB, RE;
>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB, RE);
>>>>>>>> +        assert(reg >= RE);
>>>>>>>> +      }
>>>>>>>> +    }
>>>>>>> Is this only for assertion? Is this necessary?
>>>>>> Yes. I've just moved this check from "if" statement, since it must be
>>>>>> true always.
>>>>>>
>>>>>>>> -void
>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo,
>>>>>>>> SelectionDAG
>>>>>>>> &DAG,
>>>>>>>> -                                        DebugLoc dl, SDValue
>>>>>>>> &Chain,
>>>>>>>> -                                        const Value *OrigArg,
>>>>>>>> -                                        unsigned
>>>>>>>> OffsetFromOrigArg,
>>>>>>>> -                                        unsigned ArgOffset,
>>>>>>>> -                                        bool ForceMutable) const {
>>>>>>>> +// Return: The frame index registers were stored into.
>>>>>>>> +int
>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo, SelectionDAG
>>>>>>>> &DAG,
>>>>>>>> +                                  DebugLoc dl, SDValue &Chain,
>>>>>>>> +                                  const Value *OrigArg,
>>>>>>>> +                                  unsigned InRegsParamRecordIdx,
>>>>>>>> +                                  unsigned OffsetFromOrigArg,
>>>>>>>> +                                  unsigned ArgOffset,
>>>>>>>> +                                  bool ForceMutable) const {
>>>>>>>
>>>>>>> It would be much cleaner if we can move the refactoring of
>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>> For sure.
>>>>>>
>>>>>> P.S.: Today I had some troubles with electricity, so I'll try present
>>>>>> new patch tomorrow.
>>>>>>
>>>>>> -Stepan.
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>> <pr15293-2013-04-12.patch>
>>>>
>>>
>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: AAPCS-C.4-vs-VFP-2013-04-17.patch
Type: text/x-diff
Size: 4014 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130417/48c2b6bb/attachment.patch>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: VarArgStyleRegisters-Refactoring-2013-04-17.patch
Type: text/x-diff
Size: 12232 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130417/48c2b6bb/attachment-0001.patch>


More information about the llvm-commits mailing list