Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Stepan Dyatkovskiy stpworld at narod.ru
Sun May 5 12:34:49 PDT 2013


r181148. What's now? clang front-end?
Manman Ren wrote:
>
>
> On May 2, 2013, at 11:08 AM, Stepan Dyatkovskiy <stpworld at narod.ru
> <mailto:stpworld at narod.ru>> wrote:
>
>> Hi Manman,
>>
>>> +  // NSAA should be evaluted (NSAA means "next stacked argument
>>> address").
>>> typo "evaluated"
>>>
>>> +  // So: NextStackOffset = NSAAOffset + SizeOfByValParamsStoredInRegs.
>>> +  // Then: NSAAOffset = NextStackOffset - SizeOfByValParamsStoredInRegs.
>>> +  unsigned NSAAOffset = State->getNextStackOffset();
>>> +  if (State->getCallOrPrologue() != Call) {
>>> +    for (unsigned i = 0, e = State->getInRegsParamsCount(); i != e;
>>> ++i) {
>>> +      unsigned RB, RE;
>>> +      State->getInRegsParamInfo(i, RB, RE);
>>> +      assert(NSAAOffset >= (RE-RB)*4 &&
>>> +             "Stack offset for byval regs doesn't introduced anymore?");
>>> +      NSAAOffset -= (RE-RB)*4;
>>> +    }
>>> +  }
>>> +
>>> +  if ((!Subtarget->isAAPCS_ABI() || NSAAOffset == 0) &&
>>> I don't quite get the above section to update NSAAOffset. Could you
>>> give an example?
>>
>> Current implementation (even before all my patches) allocates stack
>> area for byval registers.
>> Then, in prologue all byval parameters are stored on stack (currently
>> single byval parameter) and used as params passed by pointer. I'm not
>> author of this approach. There are both pros and cons. Main "pros":
>> this behaviour relaxes registers pressure.
> Thanks for the detailed explanation.
>>
>> That kind of stack usage outsides AAPCS rules. AAPCS assumes that if
>> parameter assigned to the registers only it sidesteps stack usage at all.
>>
>> Example:
>>
>> typedef struct { int field0; } struct1_t;
>> typedef struct { int fields[5]; } struct2_t;
>>
>> foo (int a, struct_t b, int c, int d)
>>
>> Reason I've added these changes is that actually NSAA from AAPCS rules
>> is *not equal* to LLVM's CCState::StackOffset+SP:
>> int a      --> r0      NSAA is SP+0, StackOffset is 0
>> struct_t b --> r1, r2  NSAA is SP+0, StackOffset := 4 (!)
>> int c      --> r3      NSAA is SP+0, StackOffset is 4
>> int d      --> SP,     NSAA := SP+4, StackOffset := 8
>>
>> Let NSAAOffset = NSAA-SP, then:
>> StackOffset = NSAAOffset + sizeof(all-by-val-regs)
>>
>> The check:
>> "if ((!Subtarget->isAAPCS_ABI() || NSAAOffset == 0)"
>> is just improvement of what was before:
>> "(!Subtarget->isAAPCS_ABI() || State->getNextStackOffset() == 0)"
>>
>>> Should we use up all GPRArgRegs when NSAAOffset != 0 so arguments
>>> after the byval will not be allocated to registers?
>> If I've understood question properly:
>> If NSAA != SP and parameter size greater than all remained GPRs we
>> send it to stack and set NCRN to R4 (so we can't use GPRs anymore).
>
> Yes, I meant the above case: NSAA != SP and byval can't all fit in GPRs.
> Thanks for fixing this in attached patch.
> The updated patch looks good to me.
>
> Thanks,
> Manman
>
>> If NSAA != SP, but parameter could be saved in remained GPRs we follow
>> C.4 rule and save this parameter in GPRs, increasing NCRN.
>>
>> I attached examples as separated .ll files.
>>
>> P.S.: While writing this post I've found that I did the mistake in
>> HandleByVal, I changed it a bit more. Would you please look at it
>> again. So've I reattached the patch too.
>>
>> -Stepan
>>
>>>
>>> Otherwise LGTM.
>>>
>>> Thanks for working on this,
>>> Manman
>>>
>>> On May 1, 2013, at 11:36 AM, Stepan Dyatkovskiy wrote:
>>>
>>>> Hello,
>>>> Please find patch in attachment that fixes byval support in ARM
>>>> backend for llvm level.
>>>>
>>>> -Stepan.
>>>>
>>>> Stepan Dyatkovskiy wrote:
>>>>> Commited in r180774. Now I'm working on milestone patch, that fixes
>>>>> byval support for llvm level. I published it before, now I need to
>>>>> update it. Hope it come soon.
>>>>>
>>>>> -Stepan.
>>>>>
>>>>> Manman Ren wrote:
>>>>>>
>>>>>> LGTM,
>>>>>>
>>>>>> Thanks,
>>>>>> Manman
>>>>>>
>>>>>> On Apr 28, 2013, at 10:50 AM, Stepan Dyatkovskiy wrote:
>>>>>>
>>>>>>> Hi Manman,
>>>>>>> Just for keeping things actual: I reattached patch with new names.
>>>>>>> -Stepan
>>>>>>>
>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>> Hi Manman,
>>>>>>>>
>>>>>>>>> Since it represents the stack area needed to save parameters
>>>>>>>>> that are
>>>>>>>> actually passed in registers, how about RegsSaveSizeForParams or
>>>>>>>> just
>>>>>>>> RegsSaveSize?
>>>>>>>>
>>>>>>>> In AAPCS R0-R3 are called like:
>>>>>>>> "Argument / result / scratch register".
>>>>>>>> So maybe we'll use "set/getArgRegsSaveSize" names?
>>>>>>>>
>>>>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>>>> AFI->getByValRegsSaveSize();"
>>>>>>>>
>>>>>>>> Yea :-) Here replace "VAReg" with "ArgRegs": ArgRegsSaveSize,
>>>>>>>> ArgRegsSize.
>>>>>>>>
>>>>>>>> -Stepan
>>>>>>>>
>>>>>>>> Manman Ren wrote:
>>>>>>>>>
>>>>>>>>> Hi Stepan,
>>>>>>>>>
>>>>>>>>> Thanks for working on this.
>>>>>>>>>
>>>>>>>>> I put in a comment about the refactor patch sometime ago:
>>>>>>>>>>> I am not sure about the renaming patch though. Functions
>>>>>>>>>>> get|setVarArgsRegSaveSize are used for both vararg and byval.
>>>>>>>>>>> It does not sound right to replace VarARgs with ByVal.
>>>>>>>>>>
>>>>>>>>>> Actually it is used not for vararg function, as I found it
>>>>>>>>>> main role
>>>>>>>>>> of this param in emitPrologue, it is used for proper stack
>>>>>>>>>> allocation.
>>>>>>>>>> Stack size should enough for storing all registers, used for byval
>>>>>>>>>> parameters. And yes, if it is VA function, we must also extend
>>>>>>>>>> this
>>>>>>>>>> stack area for all unallocated R0-R3 regs storage.
>>>>>>>>>
>>>>>>>>> Agree. That is why I think ByVal is not a great name either.
>>>>>>>>> Since it represents the stack area needed to save parameters
>>>>>>>>> that are
>>>>>>>>> actually passed in registers, how about RegsSaveSizeForParams
>>>>>>>>> or just
>>>>>>>>> RegsSaveSize?
>>>>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>>>>> AFI->getByValRegsSaveSize();" where we assign return value of
>>>>>>>>> getByValRegsSaveSize to a variable "VARegSaveSize".
>>>>>>>>>
>>>>>>>>> Manman
>>>>>>>>>
>>>>>>>>> On Apr 26, 2013, at 11:12 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>
>>>>>>>>>> Hi Manman and Renato,
>>>>>>>>>> Patch was tested with test-suite all test have bee passed.
>>>>>>>>>> -Stepan
>>>>>>>>>>
>>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>>> Hi all,
>>>>>>>>>>>
>>>>>>>>>>> AAPCS, 5.5 Parameters Passing, Stage C, C4 and C5 have been
>>>>>>>>>>> fixed,
>>>>>>>>>>> r180011.
>>>>>>>>>>>
>>>>>>>>>>> So, next patch is in attachment: VarArgsStyleRegisters
>>>>>>>>>>> refactoring.
>>>>>>>>>>>
>>>>>>>>>>> -Stepan.
>>>>>>>>>>>
>>>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>>>> Hello, Manman.
>>>>>>>>>>>> That's kind of snowball!
>>>>>>>>>>>> Today test-suite finished with two new failures on my beagle.
>>>>>>>>>>>> It helped me find out one more hidden logic dependency in ARM
>>>>>>>>>>>> backend.
>>>>>>>>>>>>
>>>>>>>>>>>> May be its not a mistake, but look:
>>>>>>>>>>>>
>>>>>>>>>>>> In ARMCallingConv.h, "f64AssignAAPCS" allocates registers
>>>>>>>>>>>> for f64.
>>>>>>>>>>>> This
>>>>>>>>>>>> method is called for VA functions, since there is no CPRCs, and
>>>>>>>>>>>> *any*
>>>>>>>>>>>> parameters could be allocated in R0-R3. Either R0,R1 or R2,R3
>>>>>>>>>>>> could be
>>>>>>>>>>>> allocated here. If all the registers except R3 were allocated
>>>>>>>>>>>> somewhere
>>>>>>>>>>>> before, f64AssignAAPCS just moves whole f64 to stack, it doesn't
>>>>>>>>>>>> waste
>>>>>>>>>>>> R3, so R3 could be allocated later. But logic seems to be same
>>>>>>>>>>>> as in
>>>>>>>>>>>> ARMTargetLowering::HandleByVal.
>>>>>>>>>>>>
>>>>>>>>>>>> My main fix in calling convention was:
>>>>>>>>>>>>
>>>>>>>>>>>> def CC_ARM_AAPCS_Common : CallingConv<[
>>>>>>>>>>>> ...
>>>>>>>>>>>> -  CCIfType<[i32], CCIf<"State.getNextStackOffset() == 0 &&"
>>>>>>>>>>>> -                       "ArgFlags.getOrigAlign() != 8",
>>>>>>>>>>>> +  CCIfType<[i32], CCIf<"ArgFlags.getOrigAlign() != 8",
>>>>>>>>>>>>                         CCAssignToReg<[R0, R1, R2, R3]>>>,
>>>>>>>>>>>>
>>>>>>>>>>>> Now I got some picture why "State.getNextStackOffset() == 0" was
>>>>>>>>>>>> added:
>>>>>>>>>>>> due to f64AssignAAPCS implementation. May be the are more
>>>>>>>>>>>> reasons:
>>>>>>>>>>>> next
>>>>>>>>>>>> test-suite results will show it then :-)
>>>>>>>>>>>>
>>>>>>>>>>>> Anyway, original CCIfType brakes C.4 and C.5, while my fix
>>>>>>>>>>>> brakes some
>>>>>>>>>>>> implicit statement defined in f64AssignAAPCS:
>>>>>>>>>>>>
>>>>>>>>>>>> "stack-offset != 0 doesn't mean all registers are allocated"
>>>>>>>>>>>>
>>>>>>>>>>>> So I've fixed f64AssignAAPCS too. Once we waste registers in
>>>>>>>>>>>> HandleByVal
>>>>>>>>>>>> (when we send parameter to stack), I think it would be good
>>>>>>>>>>>> to keep
>>>>>>>>>>>> same
>>>>>>>>>>>> behaviour for f64AssignAAPCS since its HandleByVal's analogue
>>>>>>>>>>>> for f64
>>>>>>>>>>>> params.
>>>>>>>>>>>>
>>>>>>>>>>>> I attached new patch, now with 3 tests :-)
>>>>>>>>>>>> (3rd test is 2013-04-21-AAPCS-VA-C.1.cp.ll)
>>>>>>>>>>>>
>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>
>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> LGTM,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks a lot,
>>>>>>>>>>>>> Manman
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Apr 18, 2013, at 8:41 PM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Fixed :-) New patch (+2 tests) is attached.
>>>>>>>>>>>>>> -Stepan
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> "AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As a side note, we may have issues with 9 floats, 3 ints
>>>>>>>>>>>>>>> and a
>>>>>>>>>>>>>>> small
>>>>>>>>>>>>>>> byval.
>>>>>>>>>>>>>>> The standard says we should split the byval only when NSAA is
>>>>>>>>>>>>>>> equal
>>>>>>>>>>>>>>> to SP (C.5), in the case of 9 floats, 3 ints and a small
>>>>>>>>>>>>>>> byval,
>>>>>>>>>>>>>>> we should not split the byval since we advanced NSAA with the
>>>>>>>>>>>>>>> 9th
>>>>>>>>>>>>>>> float.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> Manman
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Manman,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except
>>>>>>>>>>>>>>>>> that the
>>>>>>>>>>>>>>>>> testing case.
>>>>>>>>>>>>>>>>> Should it have 9 doubles and 1 integer instead of a single
>>>>>>>>>>>>>>>>> double
>>>>>>>>>>>>>>>>> and a single integer?
>>>>>>>>>>>>>>>>> The testing case passes even without the fix.
>>>>>>>>>>>>>>>> That was my midnight mistake, sorry.
>>>>>>>>>>>>>>>> I reattached patch with proper test. In the end of test
>>>>>>>>>>>>>>>> file, I've
>>>>>>>>>>>>>>>> added comments for @doFoo: small description of how
>>>>>>>>>>>>>>>> parameters
>>>>>>>>>>>>>>>> must
>>>>>>>>>>>>>>>> be passed:
>>>>>>>>>>>>>>>> first eight doubles --> D0-D7
>>>>>>>>>>>>>>>> 9th double --> Stack
>>>>>>>>>>>>>>>> 10th param (first i32) --> R0.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It is important to have 9 double params here, since we must
>>>>>>>>>>>>>>>> fill-up
>>>>>>>>>>>>>>>> all possible VFPs, then get compiler to use stack offset,
>>>>>>>>>>>>>>>> and only
>>>>>>>>>>>>>>>> after start use GPRs.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am not sure about the renaming patch though. Functions
>>>>>>>>>>>>>>>>> get|setVarArgsRegSaveSize are used for both vararg and
>>>>>>>>>>>>>>>>> byval.
>>>>>>>>>>>>>>>>> It does not sound right to replace VarARgs with ByVal.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Actually it is used not for vararg function, as I found
>>>>>>>>>>>>>>>> it main
>>>>>>>>>>>>>>>> role of this param in emitPrologue, it is used for
>>>>>>>>>>>>>>>> proper stack
>>>>>>>>>>>>>>>> allocation.
>>>>>>>>>>>>>>>> Stack size should enough for storing all registers, used for
>>>>>>>>>>>>>>>> byval
>>>>>>>>>>>>>>>> parameters. And yes, if it is VA function, we must also
>>>>>>>>>>>>>>>> extend
>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>> stack area for all unallocated R0-R3 regs storage.
>>>>>>>>>>>>>>> Agree. That is why I think ByVal is not a great name either.
>>>>>>>>>>>>>>> Since it represents the stack area needed to save parameters
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>> are actually passed in registers, how about
>>>>>>>>>>>>>>> RegsSaveSizeForParams or
>>>>>>>>>>>>>>> just RegsSaveSize?
>>>>>>>>>>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>>>>>>>>>>> AFI->getByValRegsSaveSize();" where we assign return value of
>>>>>>>>>>>>>>> getByValRegsSaveSize to a variable "VARegSaveSize".
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for working on this.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks for working on this.
>>>>>>>>>>>>>>>>> Manman
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Manman,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>>>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>>>>>>>>>>> Can we fix the above issue in a separate patch with
>>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>> testing case?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Well.. We can use next case: Co-Processor register
>>>>>>>>>>>>>>>>>> candidates
>>>>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>>>>> either in VFP or in stack, so when all VFP are allocated
>>>>>>>>>>>>>>>>>> stack is
>>>>>>>>>>>>>>>>>> used. We can use stack without GPR allocation in that
>>>>>>>>>>>>>>>>>> case,
>>>>>>>>>>>>>>>>>> passing 9 f64 params, for example: first eight goes to
>>>>>>>>>>>>>>>>>> d0-d7,
>>>>>>>>>>>>>>>>>> night goes to stack. Now as 10th parameter we pass i32,
>>>>>>>>>>>>>>>>>> according
>>>>>>>>>>>>>>>>>> to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [quote]
>>>>>>>>>>>>>>>>>> If the size in words of the argument is not more than r4
>>>>>>>>>>>>>>>>>> minus
>>>>>>>>>>>>>>>>>> NCRN, the argument is copied into
>>>>>>>>>>>>>>>>>> core registers, starting at the NCRN. The NCRN is
>>>>>>>>>>>>>>>>>> incremented by
>>>>>>>>>>>>>>>>>> the number of registers used.
>>>>>>>>>>>>>>>>>> Successive registers hold the parts of the argument they
>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>> hold if its value were loaded into
>>>>>>>>>>>>>>>>>> those registers from memory using an LDM instruction. The
>>>>>>>>>>>>>>>>>> argument has now been allocated.
>>>>>>>>>>>>>>>>>> [/quote]
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> It must to R0 in our case. But currently it goes to stack.
>>>>>>>>>>>>>>>>>> I also checked this case with GCC - it works as in AAPCS
>>>>>>>>>>>>>>>>>> written.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> About refactoring. I propose do it in two stages, first
>>>>>>>>>>>>>>>>>> rename
>>>>>>>>>>>>>>>>>> "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and
>>>>>>>>>>>>>>>>>> detach
>>>>>>>>>>>>>>>>>> "StoreByValRegs" functionality from "VarArgStyleRegisters"
>>>>>>>>>>>>>>>>>> method.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I attached first refactor patch, Calling-Convention patch
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> test-case, and calling-convention .c test case.
>>>>>>>>>>>>>>>>>> Try run .c file with "arm-linux-gnueabihf-gcc" and
>>>>>>>>>>>>>>>>>> "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> P.S.: My beagleboard is very very slow, "release"
>>>>>>>>>>>>>>>>>> build takes
>>>>>>>>>>>>>>>>>> about 15 hours, test-suite may be 15-20 hours too. I'll
>>>>>>>>>>>>>>>>>> try ask
>>>>>>>>>>>>>>>>>> somebody to present me faster device, but it take some
>>>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hello Manman,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I reattached full patch with fixes.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. I've removedskipWastedByValRegs,
>>>>>>>>>>>>>>>>>>>> addByValWastedRegs and
>>>>>>>>>>>>>>>>>>>> Wasted flag.
>>>>>>>>>>>>>>>>>>>> 2. Fixed usage of "offset" variable in
>>>>>>>>>>>>>>>>>>>> LowerFormalArguments
>>>>>>>>>>>>>>>>>>>> 3. And fixed regression test: added function with
>>>>>>>>>>>>>>>>>>>> two byval
>>>>>>>>>>>>>>>>>>>> structures.
>>>>>>>>>>>>>>>>>>> The above looks good.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>>>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Currently, for align == 4 and parameter type <=i32, we
>>>>>>>>>>>>>>>>>>>> try to
>>>>>>>>>>>>>>>>>>>> assign it to the first unallocated register from [r0-r3]
>>>>>>>>>>>>>>>>>>>> range,
>>>>>>>>>>>>>>>>>>>> but only if getNextStackOffset() == 0.
>>>>>>>>>>>>>>>>>>>> I suppose that "getNextStackOffset() != 0" is treated as
>>>>>>>>>>>>>>>>>>>> fact?
>>>>>>>>>>>>>>>>>>>> that all registers are used? and params goes to the
>>>>>>>>>>>>>>>>>>>> stack now.
>>>>>>>>>>>>>>>>>>>> After byval structures will be implemented, the next
>>>>>>>>>>>>>>>>>>>> case is
>>>>>>>>>>>>>>>>>>>> possible:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> void @f(%i32_struct* byval, i32)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Here, we need to allocate stack for first parameter, but
>>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>>> we still have unallocated registers. And second
>>>>>>>>>>>>>>>>>>>> parameter
>>>>>>>>>>>>>>>>>>>> should be sent to r1. Current CC_ARM_AAPCS_Common
>>>>>>>>>>>>>>>>>>>> implementation will send second parameter to the stack.
>>>>>>>>>>>>>>>>>>>> Though, I'm still not sure about this change, just
>>>>>>>>>>>>>>>>>>>> because may
>>>>>>>>>>>>>>>>>>>> be there are more reasons to check nextStackOffset
>>>>>>>>>>>>>>>>>>>> value.
>>>>>>>>>>>>>>>>>>> Can we fix the above issue in a separate patch with
>>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>> testing case?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>>>>>>>>>>>> RE);
>>>>>>>>>>>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>>>>>>>>>>>> Yes. I've just moved this check from "if" statement,
>>>>>>>>>>>>>>>>>>>>> since it
>>>>>>>>>>>>>>>>>>>>> must be
>>>>>>>>>>>>>>>>>>>>> true always.
>>>>>>>>>>>>>>>>>>>> How do you think, do we need this assertion?
>>>>>>>>>>>>>>>>>>> I don't really think it is necessary :)
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I didn't present refactor patch, just because main patch
>>>>>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>>>> get LGTM community and hypothetically may contain some
>>>>>>>>>>>>>>>>>>>> mistakes..
>>>>>>>>>>>>>>>>>>> LGTM otherwise.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> Manman
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -Stepan
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi Stepan,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> One high-level question: do we need to insert the
>>>>>>>>>>>>>>>>>>>>>> wasted
>>>>>>>>>>>>>>>>>>>>>> record to
>>>>>>>>>>>>>>>>>>>>>> ByValRegs and later on skip those records?
>>>>>>>>>>>>>>>>>>>>>> It seems to me that we can only insert the records for
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> actual byval
>>>>>>>>>>>>>>>>>>>>>> parameters.
>>>>>>>>>>>>>>>>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>>>>>>>>>>>>>>>>> skipWastedByValRegs.
>>>>>>>>>>>>>>>>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>> +        // "offset" value would be usefull if byval
>>>>>>>>>>>>>>>>>>>>>>> parameter outsides
>>>>>>>>>>>>>>>>>>>>>>> +        // the registers area. Note, in that case
>>>>>>>>>>>>>>>>>>>>>>> nextInRegsParam()
>>>>>>>>>>>>>>>>>>>>>>> will not
>>>>>>>>>>>>>>>>>>>>>>> +        // skip any waste registers, since parameter
>>>>>>>>>>>>>>>>>>>>>>> fillup
>>>>>>>>>>>>>>>>>>>>>>> the whole
>>>>>>>>>>>>>>>>>>>>>>> rest
>>>>>>>>>>>>>>>>>>>>>>> +        // of registers set.
>>>>>>>>>>>>>>>>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>>>>>>>>>>>>>>>>> +        if (!CCInfo.nextInRegsParam() &&
>>>>>>>>>>>>>>>>>>>>>>> Flags.getByValSize() >
>>>>>>>>>>>>>>>>>>>>>>> 4*RegsUsed )
>>>>>>>>>>>>>>>>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>>>>>>>>>>>>>>>>       }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>>>>>>>>>>>>>>>>> +      if (offset) {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> This still does not look right.
>>>>>>>>>>>>>>>>>>>>>> There are 3 cases:
>>>>>>>>>>>>>>>>>>>>>> If the byval parameter is in registers partially, we
>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>>>>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>>>>>>>>>>>>>>>>> If the byval parameter is all in registers, there is
>>>>>>>>>>>>>>>>>>>>>> no need
>>>>>>>>>>>>>>>>>>>>>> to create a
>>>>>>>>>>>>>>>>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>>>>>>>>>>>>>>>>> If the byval parameter is all on stack, we should
>>>>>>>>>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>>>>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> When looking at the above code, offset is default to
>>>>>>>>>>>>>>>>>>>>>> zero, if
>>>>>>>>>>>>>>>>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is
>>>>>>>>>>>>>>>>>>>>>> all on
>>>>>>>>>>>>>>>>>>>>>> stack),
>>>>>>>>>>>>>>>>>>>>>> offset will still be zero,
>>>>>>>>>>>>>>>>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>>>>>>>>>>>>>>>>> The correct logic seems to be always updating offset
>>>>>>>>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>>>>>>>>> the if and
>>>>>>>>>>>>>>>>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>>>>>>>>>>>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>   offset = RegEnd - RegBegin; //number of registers
>>>>>>>>>>>>>>>>>>>>>> occupied
>>>>>>>>>>>>>>>>>>>>>> by this
>>>>>>>>>>>>>>>>>>>>>> byval parameter
>>>>>>>>>>>>>>>>>>>>>>   CCInfo.nextInRegsParam(); // go to the next byval
>>>>>>>>>>>>>>>>>>>>>> parameter that is
>>>>>>>>>>>>>>>>>>>>>> in registers partially or all
>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>>>>>>>>>>>>>>>>   // Part of the byval parameter is on stack.
>>>>>>>>>>>>>>>>>>>>>>   ...
>>>>>>>>>>>>>>>>>>>>> Oh.. I missed third case. Then we keep first
>>>>>>>>>>>>>>>>>>>>> variant here.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>>>>>>>>>>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>>>>>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>>>>>>>>>>>> +    // All registers reserved for byval parameters
>>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>>> be allocated.
>>>>>>>>>>>>>>>>>>>>>>> +    // Check that first unallocated register is
>>>>>>>>>>>>>>>>>>>>>>> higher
>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>> last one
>>>>>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>>>>>>>>>>>> RE);
>>>>>>>>>>>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>>>>>>>>>>>> Yes. I've just moved this check from "if" statement,
>>>>>>>>>>>>>>>>>>>>> since it
>>>>>>>>>>>>>>>>>>>>> must be
>>>>>>>>>>>>>>>>>>>>> true always.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> -void
>>>>>>>>>>>>>>>>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState
>>>>>>>>>>>>>>>>>>>>>>> &CCInfo,
>>>>>>>>>>>>>>>>>>>>>>> SelectionDAG
>>>>>>>>>>>>>>>>>>>>>>> &DAG,
>>>>>>>>>>>>>>>>>>>>>>> -                                        DebugLoc dl,
>>>>>>>>>>>>>>>>>>>>>>> SDValue &Chain,
>>>>>>>>>>>>>>>>>>>>>>> -                                        const Value
>>>>>>>>>>>>>>>>>>>>>>> *OrigArg,
>>>>>>>>>>>>>>>>>>>>>>> -                                        unsigned
>>>>>>>>>>>>>>>>>>>>>>> OffsetFromOrigArg,
>>>>>>>>>>>>>>>>>>>>>>> -                                        unsigned
>>>>>>>>>>>>>>>>>>>>>>> ArgOffset,
>>>>>>>>>>>>>>>>>>>>>>> -                                        bool
>>>>>>>>>>>>>>>>>>>>>>> ForceMutable)
>>>>>>>>>>>>>>>>>>>>>>> const {
>>>>>>>>>>>>>>>>>>>>>>> +// Return: The frame index registers were stored
>>>>>>>>>>>>>>>>>>>>>>> into.
>>>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo,
>>>>>>>>>>>>>>>>>>>>>>> SelectionDAG &DAG,
>>>>>>>>>>>>>>>>>>>>>>> +                                  DebugLoc dl,
>>>>>>>>>>>>>>>>>>>>>>> SDValue
>>>>>>>>>>>>>>>>>>>>>>> &Chain,
>>>>>>>>>>>>>>>>>>>>>>> +                                  const Value
>>>>>>>>>>>>>>>>>>>>>>> *OrigArg,
>>>>>>>>>>>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>>>>>>>>>>>> InRegsParamRecordIdx,
>>>>>>>>>>>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>>>>>>>>>>>> OffsetFromOrigArg,
>>>>>>>>>>>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>>>>>>>>>>>> ArgOffset,
>>>>>>>>>>>>>>>>>>>>>>> +                                  bool ForceMutable)
>>>>>>>>>>>>>>>>>>>>>>> const {
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> It would be much cleaner if we can move the
>>>>>>>>>>>>>>>>>>>>>> refactoring of
>>>>>>>>>>>>>>>>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>>>>>>>>>>>>>>>>> For sure.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> P.S.: Today I had some troubles with electricity, so
>>>>>>>>>>>>>>>>>>>>> I'll try
>>>>>>>>>>>>>>>>>>>>> present
>>>>>>>>>>>>>>>>>>>>> new patch tomorrow.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>>>>>>>>>>>> <mailto:llvm-commits at cs.uiuc.edu>
>>>>>>>>>>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> <pr15293-2013-04-12.patch>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <AAPCS-C.4-vs-VFP-2013-04-17.patch>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <AAPCS-C.4-vs-VFP-2013-04-19.patch>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> <StoreByValRegs-2013-04-17.patch>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> <StoreByValRegs-2013-04-28.patch>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>
>>>> <pr15293-2013-05-01.patch>
>>>
>>
>> <2013-05-02-AAPCS-ByVal-Structs-C4-C5-VFP.ll><2013-05-02-AAPCS-ByVal-Structs-C4-C5-VFP2.ll><MySandbox-test-c.4-c.5.ll><pr15293-2013-05-02.patch>
>




More information about the llvm-commits mailing list