Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Stepan Dyatkovskiy stpworld at narod.ru
Fri Apr 26 11:12:18 PDT 2013


Hi Manman and Renato,
Patch was tested with test-suite all test have bee passed.
-Stepan

Stepan Dyatkovskiy wrote:
> Hi all,
>
> AAPCS, 5.5 Parameters Passing, Stage C, C4 and C5 have been fixed, r180011.
>
> So, next patch is in attachment: VarArgsStyleRegisters refactoring.
>
> -Stepan.
>
> Stepan Dyatkovskiy wrote:
>> Hello, Manman.
>> That's kind of snowball!
>> Today test-suite finished with two new failures on my beagle.
>> It helped me find out one more hidden logic dependency in ARM backend.
>>
>> May be its not a mistake, but look:
>>
>> In ARMCallingConv.h, "f64AssignAAPCS" allocates registers for f64. This
>> method is called for VA functions, since there is no CPRCs, and *any*
>> parameters could be allocated in R0-R3. Either R0,R1 or R2,R3 could be
>> allocated here. If all the registers except R3 were allocated somewhere
>> before, f64AssignAAPCS just moves whole f64 to stack, it doesn't waste
>> R3, so R3 could be allocated later. But logic seems to be same as in
>> ARMTargetLowering::HandleByVal.
>>
>> My main fix in calling convention was:
>>
>> def CC_ARM_AAPCS_Common : CallingConv<[
>> ...
>> -  CCIfType<[i32], CCIf<"State.getNextStackOffset() == 0 &&"
>> -                       "ArgFlags.getOrigAlign() != 8",
>> +  CCIfType<[i32], CCIf<"ArgFlags.getOrigAlign() != 8",
>>                          CCAssignToReg<[R0, R1, R2, R3]>>>,
>>
>> Now I got some picture why "State.getNextStackOffset() == 0" was added:
>> due to f64AssignAAPCS implementation. May be the are more reasons: next
>> test-suite results will show it then :-)
>>
>> Anyway, original CCIfType brakes C.4 and C.5, while my fix brakes some
>> implicit statement defined in f64AssignAAPCS:
>>
>> "stack-offset != 0 doesn't mean all registers are allocated"
>>
>> So I've fixed f64AssignAAPCS too. Once we waste registers in HandleByVal
>> (when we send parameter to stack), I think it would be good to keep same
>> behaviour for f64AssignAAPCS since its HandleByVal's analogue for f64
>> params.
>>
>> I attached new patch, now with 3 tests :-)
>> (3rd test is 2013-04-21-AAPCS-VA-C.1.cp.ll)
>>
>> -Stepan.
>>
>> Manman Ren wrote:
>>>
>>> LGTM,
>>>
>>> Thanks a lot,
>>> Manman
>>>
>>> On Apr 18, 2013, at 8:41 PM, Stepan Dyatkovskiy wrote:
>>>
>>>> Fixed :-) New patch (+2 tests) is attached.
>>>> -Stepan
>>>>
>>>> Manman Ren wrote:
>>>>>
>>>>> "AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.
>>>>>
>>>>> As a side note, we may have issues with 9 floats, 3 ints and a small
>>>>> byval.
>>>>> The standard says we should split the byval only when NSAA is equal
>>>>> to SP (C.5), in the case of 9 floats, 3 ints and a small byval,
>>>>> we should not split the byval since we advanced NSAA with the 9th
>>>>> float.
>>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>
>>>>> On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:
>>>>>
>>>>>> Hi Manman,
>>>>>>
>>>>>>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that the
>>>>>>> testing case.
>>>>>>> Should it have 9 doubles and 1 integer instead of a single double
>>>>>>> and a single integer?
>>>>>>> The testing case passes even without the fix.
>>>>>> That was my midnight mistake, sorry.
>>>>>> I reattached patch with proper test. In the end of test file, I've
>>>>>> added comments for @doFoo: small description of how parameters must
>>>>>> be passed:
>>>>>> first eight doubles --> D0-D7
>>>>>> 9th double --> Stack
>>>>>> 10th param (first i32) --> R0.
>>>>>>
>>>>>> It is important to have 9 double params here, since we must fill-up
>>>>>> all possible VFPs, then get compiler to use stack offset, and only
>>>>>> after start use GPRs.
>>>>>>
>>>>>>>
>>>>>>> I am not sure about the renaming patch though. Functions
>>>>>>> get|setVarArgsRegSaveSize are used for both vararg and byval.
>>>>>>> It does not sound right to replace VarARgs with ByVal.
>>>>>>
>>>>>> Actually it is used not for vararg function, as I found it main
>>>>>> role of this param in emitPrologue, it is used for proper stack
>>>>>> allocation.
>>>>>> Stack size should enough for storing all registers, used for byval
>>>>>> parameters. And yes, if it is VA function, we must also extend this
>>>>>> stack area for all unallocated R0-R3 regs storage.
>>>>> Agree. That is why I think ByVal is not a great name either.
>>>>> Since it represents the stack area needed to save parameters that
>>>>> are actually passed in registers, how about RegsSaveSizeForParams or
>>>>> just RegsSaveSize?
>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>> AFI->getByValRegsSaveSize();" where we assign return value of
>>>>> getByValRegsSaveSize to a variable "VARegSaveSize".
>>>>>
>>>>> Thanks for working on this.
>>>>>>
>>>>>> -Stepan.
>>>>>>
>>>>>>> Thanks for working on this.
>>>>>>> Manman
>>>>>>>
>>>>>>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>>>>>>
>>>>>>>> Hi Manman,
>>>>>>>>
>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>> Can we fix the above issue in a separate patch with additional
>>>>>>>>> testing case?
>>>>>>>>
>>>>>>>> Well.. We can use next case: Co-Processor register candidates
>>>>>>>> may be
>>>>>>>> either in VFP or in stack, so when all VFP are allocated stack is
>>>>>>>> used. We can use stack without GPR allocation in that case,
>>>>>>>> passing 9 f64 params, for example: first eight goes to d0-d7,
>>>>>>>> night goes to stack. Now as 10th parameter we pass i32, according
>>>>>>>> to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>>>>>>>
>>>>>>>> [quote]
>>>>>>>> If the size in words of the argument is not more than r4 minus
>>>>>>>> NCRN, the argument is copied into
>>>>>>>> core registers, starting at the NCRN. The NCRN is incremented by
>>>>>>>> the number of registers used.
>>>>>>>> Successive registers hold the parts of the argument they would
>>>>>>>> hold if its value were loaded into
>>>>>>>> those registers from memory using an LDM instruction. The
>>>>>>>> argument has now been allocated.
>>>>>>>> [/quote]
>>>>>>>>
>>>>>>>> It must to R0 in our case. But currently it goes to stack.
>>>>>>>> I also checked this case with GCC - it works as in AAPCS written.
>>>>>>>>
>>>>>>>> About refactoring. I propose do it in two stages, first rename
>>>>>>>> "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and detach
>>>>>>>> "StoreByValRegs" functionality from "VarArgStyleRegisters" method.
>>>>>>>>
>>>>>>>> I attached first refactor patch, Calling-Convention patch with
>>>>>>>> test-case, and calling-convention .c test case.
>>>>>>>> Try run .c file with "arm-linux-gnueabihf-gcc" and
>>>>>>>> "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>>>>>>
>>>>>>>> P.S.: My beagleboard is very very slow, "release" build takes
>>>>>>>> about 15 hours, test-suite may be 15-20 hours too. I'll try ask
>>>>>>>> somebody to present me faster device, but it take some time.
>>>>>>>>
>>>>>>>> -Stepan.
>>>>>>>>
>>>>>>>> Manman Ren wrote:
>>>>>>>>>
>>>>>>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>
>>>>>>>>>> Hello Manman,
>>>>>>>>>>
>>>>>>>>>> I reattached full patch with fixes.
>>>>>>>>>>
>>>>>>>>>> 1. I've removedskipWastedByValRegs, addByValWastedRegs and
>>>>>>>>>> Wasted flag.
>>>>>>>>>> 2. Fixed usage of "offset" variable in LowerFormalArguments
>>>>>>>>>> 3. And fixed regression test: added function with two byval
>>>>>>>>>> structures.
>>>>>>>>> The above looks good.
>>>>>>>>>>
>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>>
>>>>>>>>>> Currently, for align == 4 and parameter type <=i32, we try to
>>>>>>>>>> assign it to the first unallocated register from [r0-r3] range,
>>>>>>>>>> but only if getNextStackOffset() == 0.
>>>>>>>>>> I suppose that "getNextStackOffset() != 0" is treated as fact?
>>>>>>>>>> that all registers are used? and params goes to the stack now.
>>>>>>>>>> After byval structures will be implemented, the next case is
>>>>>>>>>> possible:
>>>>>>>>>>
>>>>>>>>>> void @f(%i32_struct* byval, i32)
>>>>>>>>>>
>>>>>>>>>> Here, we need to allocate stack for first parameter, but after
>>>>>>>>>> we still have unallocated registers. And second parameter
>>>>>>>>>> should be sent to r1. Current CC_ARM_AAPCS_Common
>>>>>>>>>> implementation will send second parameter to the stack.
>>>>>>>>>> Though, I'm still not sure about this change, just because may
>>>>>>>>>> be there are more reasons to check nextStackOffset value.
>>>>>>>>> Can we fix the above issue in a separate patch with additional
>>>>>>>>> testing case?
>>>>>>>>>>
>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>> +    {
>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>> RE);
>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>> +      }
>>>>>>>>>>>>> +    }
>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>> Yes. I've just moved this check from "if" statement, since it
>>>>>>>>>>> must be
>>>>>>>>>>> true always.
>>>>>>>>>> How do you think, do we need this assertion?
>>>>>>>>> I don't really think it is necessary :)
>>>>>>>>>>
>>>>>>>>>> I didn't present refactor patch, just because main patch didn't
>>>>>>>>>> get LGTM community and hypothetically may contain some mistakes..
>>>>>>>>> LGTM otherwise.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Manman
>>>>>>>>>>
>>>>>>>>>> -Stepan
>>>>>>>>>>
>>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Stepan,
>>>>>>>>>>>>
>>>>>>>>>>>> One high-level question: do we need to insert the wasted
>>>>>>>>>>>> record to
>>>>>>>>>>>> ByValRegs and later on skip those records?
>>>>>>>>>>>> It seems to me that we can only insert the records for the
>>>>>>>>>>>> actual byval
>>>>>>>>>>>> parameters.
>>>>>>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>>>>>>> skipWastedByValRegs.
>>>>>>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> +
>>>>>>>>>>>>> +        // "offset" value would be usefull if byval
>>>>>>>>>>>>> parameter outsides
>>>>>>>>>>>>> +        // the registers area. Note, in that case
>>>>>>>>>>>>> nextInRegsParam()
>>>>>>>>>>>>> will not
>>>>>>>>>>>>> +        // skip any waste registers, since parameter fillup
>>>>>>>>>>>>> the whole
>>>>>>>>>>>>> rest
>>>>>>>>>>>>> +        // of registers set.
>>>>>>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>>>>>>> +        if (!CCInfo.nextInRegsParam() &&
>>>>>>>>>>>>> Flags.getByValSize() >
>>>>>>>>>>>>> 4*RegsUsed )
>>>>>>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>>>>>>        }
>>>>>>>>>>>>>
>>>>>>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>>>>>>> +      if (offset) {
>>>>>>>>>>>>
>>>>>>>>>>>> This still does not look right.
>>>>>>>>>>>> There are 3 cases:
>>>>>>>>>>>> If the byval parameter is in registers partially, we should
>>>>>>>>>>>> create a
>>>>>>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>>>>>>> If the byval parameter is all in registers, there is no need
>>>>>>>>>>>> to create a
>>>>>>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>>>>>>> If the byval parameter is all on stack, we should create a
>>>>>>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>>>>>>
>>>>>>>>>>>> When looking at the above code, offset is default to zero, if
>>>>>>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is all on
>>>>>>>>>>>> stack),
>>>>>>>>>>>> offset will still be zero,
>>>>>>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>>>>>>> The correct logic seems to be always updating offset inside
>>>>>>>>>>>> the if and
>>>>>>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>>>>>>>>>>>    offset = RegEnd - RegBegin; //number of registers occupied
>>>>>>>>>>>> by this
>>>>>>>>>>>> byval parameter
>>>>>>>>>>>>    CCInfo.nextInRegsParam(); // go to the next byval
>>>>>>>>>>>> parameter that is
>>>>>>>>>>>> in registers partially or all
>>>>>>>>>>>> }
>>>>>>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>>>>>>    // Part of the byval parameter is on stack.
>>>>>>>>>>>>    ...
>>>>>>>>>>> Oh.. I missed third case. Then we keep first variant here.
>>>>>>>>>>>
>>>>>>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>> +    // All registers reserved for byval parameters should
>>>>>>>>>>>>> be allocated.
>>>>>>>>>>>>> +    // Check that first unallocated register is higher that
>>>>>>>>>>>>> last one
>>>>>>>>>>>>> used
>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>> +    {
>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>> +        State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>> RE);
>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>> +      }
>>>>>>>>>>>>> +    }
>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>> Yes. I've just moved this check from "if" statement, since it
>>>>>>>>>>> must be
>>>>>>>>>>> true always.
>>>>>>>>>>>
>>>>>>>>>>>>> -void
>>>>>>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState &CCInfo,
>>>>>>>>>>>>> SelectionDAG
>>>>>>>>>>>>> &DAG,
>>>>>>>>>>>>> -                                        DebugLoc dl,
>>>>>>>>>>>>> SDValue &Chain,
>>>>>>>>>>>>> -                                        const Value *OrigArg,
>>>>>>>>>>>>> -                                        unsigned
>>>>>>>>>>>>> OffsetFromOrigArg,
>>>>>>>>>>>>> -                                        unsigned ArgOffset,
>>>>>>>>>>>>> -                                        bool ForceMutable)
>>>>>>>>>>>>> const {
>>>>>>>>>>>>> +// Return: The frame index registers were stored into.
>>>>>>>>>>>>> +int
>>>>>>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo,
>>>>>>>>>>>>> SelectionDAG &DAG,
>>>>>>>>>>>>> +                                  DebugLoc dl, SDValue
>>>>>>>>>>>>> &Chain,
>>>>>>>>>>>>> +                                  const Value *OrigArg,
>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>> InRegsParamRecordIdx,
>>>>>>>>>>>>> +                                  unsigned OffsetFromOrigArg,
>>>>>>>>>>>>> +                                  unsigned ArgOffset,
>>>>>>>>>>>>> +                                  bool ForceMutable) const {
>>>>>>>>>>>>
>>>>>>>>>>>> It would be much cleaner if we can move the refactoring of
>>>>>>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>>>>>>> For sure.
>>>>>>>>>>>
>>>>>>>>>>> P.S.: Today I had some troubles with electricity, so I'll try
>>>>>>>>>>> present
>>>>>>>>>>> new patch tomorrow.
>>>>>>>>>>>
>>>>>>>>>>> -Stepan.
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>>
>>>>>>>>>> <pr15293-2013-04-12.patch>
>>>>>>>>>
>>>>>>>>
>>>>>>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> <AAPCS-C.4-vs-VFP-2013-04-17.patch>
>>>>>
>>>>
>>>> <AAPCS-C.4-vs-VFP-2013-04-19.patch>
>>>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: StoreByValRegs-2013-04-17.patch
Type: text/x-diff
Size: 12232 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130427/c0999958/attachment.patch>


More information about the llvm-commits mailing list