Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Stepan Dyatkovskiy stpworld at narod.ru
Thu May 2 11:17:50 PDT 2013


Hi Manman,
Sorry for "corrupted" example.

Example:

typedef struct { int field0; } struct1_t;
typedef struct { int fields[2]; } struct2_t;

void foo (int a, struct1_t b, struct2_t c, int d);

Example shows that NSAA != StackOffset:

int a        --> r0      NSAA is SP
struct1_t b  --> r1      NSAA is SP,   StackOffset := 4 (!)
struct2_t c  --> r2, r3  NSAA is SP,   StackOffset := 12
int d        --> SP      NSAA := SP+4, StackOffset := 16

Note, stack area for by-val registers allocated after all other params, 
so it has lower address.

-Stepan.

Stepan Dyatkovskiy wrote:
> Hi Manman,
>
>> +  // NSAA should be evaluted (NSAA means "next stacked argument
>> address").
>> typo "evaluated"
>>
>> +  // So: NextStackOffset = NSAAOffset + SizeOfByValParamsStoredInRegs.
>> +  // Then: NSAAOffset = NextStackOffset - SizeOfByValParamsStoredInRegs.
>> +  unsigned NSAAOffset = State->getNextStackOffset();
>> +  if (State->getCallOrPrologue() != Call) {
>> +    for (unsigned i = 0, e = State->getInRegsParamsCount(); i != e;
>> ++i) {
>> +      unsigned RB, RE;
>> +      State->getInRegsParamInfo(i, RB, RE);
>> +      assert(NSAAOffset >= (RE-RB)*4 &&
>> +             "Stack offset for byval regs doesn't introduced anymore?");
>> +      NSAAOffset -= (RE-RB)*4;
>> +    }
>> +  }
>> +
>> +  if ((!Subtarget->isAAPCS_ABI() || NSAAOffset == 0) &&
>> I don't quite get the above section to update NSAAOffset. Could you
>> give an example?
>
> Current implementation (even before all my patches) allocates stack area
> for byval registers.
> Then, in prologue all byval parameters are stored on stack (currently
> single byval parameter) and used as params passed by pointer. I'm not
> author of this approach. There are both pros and cons. Main "pros": this
> behaviour relaxes registers pressure.
>
> That kind of stack usage outsides AAPCS rules. AAPCS assumes that if
> parameter assigned to the registers only it sidesteps stack usage at all.
>
> Example:
>
> typedef struct { int field0; } struct1_t;
> typedef struct { int fields[5]; } struct2_t;
>
> foo (int a, struct_t b, int c, int d)
>
> Reason I've added these changes is that actually NSAA from AAPCS rules
> is *not equal* to LLVM's CCState::StackOffset+SP:
> int a      --> r0      NSAA is SP+0, StackOffset is 0
> struct_t b --> r1, r2  NSAA is SP+0, StackOffset := 4 (!)
> int c      --> r3      NSAA is SP+0, StackOffset is 4
> int d      --> SP,     NSAA := SP+4, StackOffset := 8
>
> Let NSAAOffset = NSAA-SP, then:
> StackOffset = NSAAOffset + sizeof(all-by-val-regs)
>
> The check:
> "if ((!Subtarget->isAAPCS_ABI() || NSAAOffset == 0)"
> is just improvement of what was before:
> "(!Subtarget->isAAPCS_ABI() || State->getNextStackOffset() == 0)"
>
>> Should we use up all GPRArgRegs when NSAAOffset != 0 so arguments
>> after the byval will not be allocated to registers?
> If I've understood question properly:
> If NSAA != SP and parameter size greater than all remained GPRs we send
> it to stack and set NCRN to R4 (so we can't use GPRs anymore).
> If NSAA != SP, but parameter could be saved in remained GPRs we follow
> C.4 rule and save this parameter in GPRs, increasing NCRN.
>
> I attached examples as separated .ll files.
>
> P.S.: While writing this post I've found that I did the mistake in
> HandleByVal, I changed it a bit more. Would you please look at it again.
> So've I reattached the patch too.
>
> -Stepan
>
>>
>> Otherwise LGTM.
>>
>> Thanks for working on this,
>> Manman
>>
>> On May 1, 2013, at 11:36 AM, Stepan Dyatkovskiy wrote:
>>
>>> Hello,
>>> Please find patch in attachment that fixes byval support in ARM
>>> backend for llvm level.
>>>
>>> -Stepan.
>>>
>>> Stepan Dyatkovskiy wrote:
>>>> Commited in r180774. Now I'm working on milestone patch, that fixes
>>>> byval support for llvm level. I published it before, now I need to
>>>> update it. Hope it come soon.
>>>>
>>>> -Stepan.
>>>>
>>>> Manman Ren wrote:
>>>>>
>>>>> LGTM,
>>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>
>>>>> On Apr 28, 2013, at 10:50 AM, Stepan Dyatkovskiy wrote:
>>>>>
>>>>>> Hi Manman,
>>>>>> Just for keeping things actual: I reattached patch with new names.
>>>>>> -Stepan
>>>>>>
>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>> Hi Manman,
>>>>>>>
>>>>>>>> Since it represents the stack area needed to save parameters
>>>>>>>> that are
>>>>>>> actually passed in registers, how about RegsSaveSizeForParams or
>>>>>>> just
>>>>>>> RegsSaveSize?
>>>>>>>
>>>>>>> In AAPCS R0-R3 are called like:
>>>>>>> "Argument / result / scratch register".
>>>>>>> So maybe we'll use "set/getArgRegsSaveSize" names?
>>>>>>>
>>>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>>> AFI->getByValRegsSaveSize();"
>>>>>>>
>>>>>>> Yea :-) Here replace "VAReg" with "ArgRegs": ArgRegsSaveSize,
>>>>>>> ArgRegsSize.
>>>>>>>
>>>>>>> -Stepan
>>>>>>>
>>>>>>> Manman Ren wrote:
>>>>>>>>
>>>>>>>> Hi Stepan,
>>>>>>>>
>>>>>>>> Thanks for working on this.
>>>>>>>>
>>>>>>>> I put in a comment about the refactor patch sometime ago:
>>>>>>>>>> I am not sure about the renaming patch though. Functions
>>>>>>>>>> get|setVarArgsRegSaveSize are used for both vararg and byval.
>>>>>>>>>> It does not sound right to replace VarARgs with ByVal.
>>>>>>>>>
>>>>>>>>> Actually it is used not for vararg function, as I found it main
>>>>>>>>> role
>>>>>>>>> of this param in emitPrologue, it is used for proper stack
>>>>>>>>> allocation.
>>>>>>>>> Stack size should enough for storing all registers, used for byval
>>>>>>>>> parameters. And yes, if it is VA function, we must also extend
>>>>>>>>> this
>>>>>>>>> stack area for all unallocated R0-R3 regs storage.
>>>>>>>>
>>>>>>>> Agree. That is why I think ByVal is not a great name either.
>>>>>>>> Since it represents the stack area needed to save parameters
>>>>>>>> that are
>>>>>>>> actually passed in registers, how about RegsSaveSizeForParams or
>>>>>>>> just
>>>>>>>> RegsSaveSize?
>>>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>>>> AFI->getByValRegsSaveSize();" where we assign return value of
>>>>>>>> getByValRegsSaveSize to a variable "VARegSaveSize".
>>>>>>>>
>>>>>>>> Manman
>>>>>>>>
>>>>>>>> On Apr 26, 2013, at 11:12 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>
>>>>>>>>> Hi Manman and Renato,
>>>>>>>>> Patch was tested with test-suite all test have bee passed.
>>>>>>>>> -Stepan
>>>>>>>>>
>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> AAPCS, 5.5 Parameters Passing, Stage C, C4 and C5 have been
>>>>>>>>>> fixed,
>>>>>>>>>> r180011.
>>>>>>>>>>
>>>>>>>>>> So, next patch is in attachment: VarArgsStyleRegisters
>>>>>>>>>> refactoring.
>>>>>>>>>>
>>>>>>>>>> -Stepan.
>>>>>>>>>>
>>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>>> Hello, Manman.
>>>>>>>>>>> That's kind of snowball!
>>>>>>>>>>> Today test-suite finished with two new failures on my beagle.
>>>>>>>>>>> It helped me find out one more hidden logic dependency in ARM
>>>>>>>>>>> backend.
>>>>>>>>>>>
>>>>>>>>>>> May be its not a mistake, but look:
>>>>>>>>>>>
>>>>>>>>>>> In ARMCallingConv.h, "f64AssignAAPCS" allocates registers for
>>>>>>>>>>> f64.
>>>>>>>>>>> This
>>>>>>>>>>> method is called for VA functions, since there is no CPRCs, and
>>>>>>>>>>> *any*
>>>>>>>>>>> parameters could be allocated in R0-R3. Either R0,R1 or R2,R3
>>>>>>>>>>> could be
>>>>>>>>>>> allocated here. If all the registers except R3 were allocated
>>>>>>>>>>> somewhere
>>>>>>>>>>> before, f64AssignAAPCS just moves whole f64 to stack, it doesn't
>>>>>>>>>>> waste
>>>>>>>>>>> R3, so R3 could be allocated later. But logic seems to be same
>>>>>>>>>>> as in
>>>>>>>>>>> ARMTargetLowering::HandleByVal.
>>>>>>>>>>>
>>>>>>>>>>> My main fix in calling convention was:
>>>>>>>>>>>
>>>>>>>>>>> def CC_ARM_AAPCS_Common : CallingConv<[
>>>>>>>>>>> ...
>>>>>>>>>>> -  CCIfType<[i32], CCIf<"State.getNextStackOffset() == 0 &&"
>>>>>>>>>>> -                       "ArgFlags.getOrigAlign() != 8",
>>>>>>>>>>> +  CCIfType<[i32], CCIf<"ArgFlags.getOrigAlign() != 8",
>>>>>>>>>>>                          CCAssignToReg<[R0, R1, R2, R3]>>>,
>>>>>>>>>>>
>>>>>>>>>>> Now I got some picture why "State.getNextStackOffset() == 0" was
>>>>>>>>>>> added:
>>>>>>>>>>> due to f64AssignAAPCS implementation. May be the are more
>>>>>>>>>>> reasons:
>>>>>>>>>>> next
>>>>>>>>>>> test-suite results will show it then :-)
>>>>>>>>>>>
>>>>>>>>>>> Anyway, original CCIfType brakes C.4 and C.5, while my fix
>>>>>>>>>>> brakes some
>>>>>>>>>>> implicit statement defined in f64AssignAAPCS:
>>>>>>>>>>>
>>>>>>>>>>> "stack-offset != 0 doesn't mean all registers are allocated"
>>>>>>>>>>>
>>>>>>>>>>> So I've fixed f64AssignAAPCS too. Once we waste registers in
>>>>>>>>>>> HandleByVal
>>>>>>>>>>> (when we send parameter to stack), I think it would be good
>>>>>>>>>>> to keep
>>>>>>>>>>> same
>>>>>>>>>>> behaviour for f64AssignAAPCS since its HandleByVal's analogue
>>>>>>>>>>> for f64
>>>>>>>>>>> params.
>>>>>>>>>>>
>>>>>>>>>>> I attached new patch, now with 3 tests :-)
>>>>>>>>>>> (3rd test is 2013-04-21-AAPCS-VA-C.1.cp.ll)
>>>>>>>>>>>
>>>>>>>>>>> -Stepan.
>>>>>>>>>>>
>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> LGTM,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks a lot,
>>>>>>>>>>>> Manman
>>>>>>>>>>>>
>>>>>>>>>>>> On Apr 18, 2013, at 8:41 PM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Fixed :-) New patch (+2 tests) is attached.
>>>>>>>>>>>>> -Stepan
>>>>>>>>>>>>>
>>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> "AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> As a side note, we may have issues with 9 floats, 3 ints
>>>>>>>>>>>>>> and a
>>>>>>>>>>>>>> small
>>>>>>>>>>>>>> byval.
>>>>>>>>>>>>>> The standard says we should split the byval only when NSAA is
>>>>>>>>>>>>>> equal
>>>>>>>>>>>>>> to SP (C.5), in the case of 9 floats, 3 ints and a small
>>>>>>>>>>>>>> byval,
>>>>>>>>>>>>>> we should not split the byval since we advanced NSAA with the
>>>>>>>>>>>>>> 9th
>>>>>>>>>>>>>> float.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Manman
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Manman,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> testing case.
>>>>>>>>>>>>>>>> Should it have 9 doubles and 1 integer instead of a single
>>>>>>>>>>>>>>>> double
>>>>>>>>>>>>>>>> and a single integer?
>>>>>>>>>>>>>>>> The testing case passes even without the fix.
>>>>>>>>>>>>>>> That was my midnight mistake, sorry.
>>>>>>>>>>>>>>> I reattached patch with proper test. In the end of test
>>>>>>>>>>>>>>> file, I've
>>>>>>>>>>>>>>> added comments for @doFoo: small description of how
>>>>>>>>>>>>>>> parameters
>>>>>>>>>>>>>>> must
>>>>>>>>>>>>>>> be passed:
>>>>>>>>>>>>>>> first eight doubles --> D0-D7
>>>>>>>>>>>>>>> 9th double --> Stack
>>>>>>>>>>>>>>> 10th param (first i32) --> R0.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It is important to have 9 double params here, since we must
>>>>>>>>>>>>>>> fill-up
>>>>>>>>>>>>>>> all possible VFPs, then get compiler to use stack offset,
>>>>>>>>>>>>>>> and only
>>>>>>>>>>>>>>> after start use GPRs.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I am not sure about the renaming patch though. Functions
>>>>>>>>>>>>>>>> get|setVarArgsRegSaveSize are used for both vararg and
>>>>>>>>>>>>>>>> byval.
>>>>>>>>>>>>>>>> It does not sound right to replace VarARgs with ByVal.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Actually it is used not for vararg function, as I found
>>>>>>>>>>>>>>> it main
>>>>>>>>>>>>>>> role of this param in emitPrologue, it is used for proper
>>>>>>>>>>>>>>> stack
>>>>>>>>>>>>>>> allocation.
>>>>>>>>>>>>>>> Stack size should enough for storing all registers, used for
>>>>>>>>>>>>>>> byval
>>>>>>>>>>>>>>> parameters. And yes, if it is VA function, we must also
>>>>>>>>>>>>>>> extend
>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>> stack area for all unallocated R0-R3 regs storage.
>>>>>>>>>>>>>> Agree. That is why I think ByVal is not a great name either.
>>>>>>>>>>>>>> Since it represents the stack area needed to save parameters
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>> are actually passed in registers, how about
>>>>>>>>>>>>>> RegsSaveSizeForParams or
>>>>>>>>>>>>>> just RegsSaveSize?
>>>>>>>>>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>>>>>>>>>> AFI->getByValRegsSaveSize();" where we assign return value of
>>>>>>>>>>>>>> getByValRegsSaveSize to a variable "VARegSaveSize".
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks for working on this.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Thanks for working on this.
>>>>>>>>>>>>>>>> Manman
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Manman,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>>>>>>>>>> Can we fix the above issue in a separate patch with
>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>> testing case?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Well.. We can use next case: Co-Processor register
>>>>>>>>>>>>>>>>> candidates
>>>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>>>> either in VFP or in stack, so when all VFP are allocated
>>>>>>>>>>>>>>>>> stack is
>>>>>>>>>>>>>>>>> used. We can use stack without GPR allocation in that
>>>>>>>>>>>>>>>>> case,
>>>>>>>>>>>>>>>>> passing 9 f64 params, for example: first eight goes to
>>>>>>>>>>>>>>>>> d0-d7,
>>>>>>>>>>>>>>>>> night goes to stack. Now as 10th parameter we pass i32,
>>>>>>>>>>>>>>>>> according
>>>>>>>>>>>>>>>>> to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [quote]
>>>>>>>>>>>>>>>>> If the size in words of the argument is not more than r4
>>>>>>>>>>>>>>>>> minus
>>>>>>>>>>>>>>>>> NCRN, the argument is copied into
>>>>>>>>>>>>>>>>> core registers, starting at the NCRN. The NCRN is
>>>>>>>>>>>>>>>>> incremented by
>>>>>>>>>>>>>>>>> the number of registers used.
>>>>>>>>>>>>>>>>> Successive registers hold the parts of the argument they
>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>> hold if its value were loaded into
>>>>>>>>>>>>>>>>> those registers from memory using an LDM instruction. The
>>>>>>>>>>>>>>>>> argument has now been allocated.
>>>>>>>>>>>>>>>>> [/quote]
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> It must to R0 in our case. But currently it goes to stack.
>>>>>>>>>>>>>>>>> I also checked this case with GCC - it works as in AAPCS
>>>>>>>>>>>>>>>>> written.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> About refactoring. I propose do it in two stages, first
>>>>>>>>>>>>>>>>> rename
>>>>>>>>>>>>>>>>> "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and
>>>>>>>>>>>>>>>>> detach
>>>>>>>>>>>>>>>>> "StoreByValRegs" functionality from "VarArgStyleRegisters"
>>>>>>>>>>>>>>>>> method.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I attached first refactor patch, Calling-Convention patch
>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>> test-case, and calling-convention .c test case.
>>>>>>>>>>>>>>>>> Try run .c file with "arm-linux-gnueabihf-gcc" and
>>>>>>>>>>>>>>>>> "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> P.S.: My beagleboard is very very slow, "release" build
>>>>>>>>>>>>>>>>> takes
>>>>>>>>>>>>>>>>> about 15 hours, test-suite may be 15-20 hours too. I'll
>>>>>>>>>>>>>>>>> try ask
>>>>>>>>>>>>>>>>> somebody to present me faster device, but it take some
>>>>>>>>>>>>>>>>> time.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hello Manman,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I reattached full patch with fixes.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> 1. I've removedskipWastedByValRegs,
>>>>>>>>>>>>>>>>>>> addByValWastedRegs and
>>>>>>>>>>>>>>>>>>> Wasted flag.
>>>>>>>>>>>>>>>>>>> 2. Fixed usage of "offset" variable in
>>>>>>>>>>>>>>>>>>> LowerFormalArguments
>>>>>>>>>>>>>>>>>>> 3. And fixed regression test: added function with two
>>>>>>>>>>>>>>>>>>> byval
>>>>>>>>>>>>>>>>>>> structures.
>>>>>>>>>>>>>>>>>> The above looks good.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Currently, for align == 4 and parameter type <=i32, we
>>>>>>>>>>>>>>>>>>> try to
>>>>>>>>>>>>>>>>>>> assign it to the first unallocated register from [r0-r3]
>>>>>>>>>>>>>>>>>>> range,
>>>>>>>>>>>>>>>>>>> but only if getNextStackOffset() == 0.
>>>>>>>>>>>>>>>>>>> I suppose that "getNextStackOffset() != 0" is treated as
>>>>>>>>>>>>>>>>>>> fact?
>>>>>>>>>>>>>>>>>>> that all registers are used? and params goes to the
>>>>>>>>>>>>>>>>>>> stack now.
>>>>>>>>>>>>>>>>>>> After byval structures will be implemented, the next
>>>>>>>>>>>>>>>>>>> case is
>>>>>>>>>>>>>>>>>>> possible:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> void @f(%i32_struct* byval, i32)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Here, we need to allocate stack for first parameter, but
>>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>>> we still have unallocated registers. And second
>>>>>>>>>>>>>>>>>>> parameter
>>>>>>>>>>>>>>>>>>> should be sent to r1. Current CC_ARM_AAPCS_Common
>>>>>>>>>>>>>>>>>>> implementation will send second parameter to the stack.
>>>>>>>>>>>>>>>>>>> Though, I'm still not sure about this change, just
>>>>>>>>>>>>>>>>>>> because may
>>>>>>>>>>>>>>>>>>> be there are more reasons to check nextStackOffset
>>>>>>>>>>>>>>>>>>> value.
>>>>>>>>>>>>>>>>>> Can we fix the above issue in a separate patch with
>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>> testing case?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>>>>>>>>>>> RE);
>>>>>>>>>>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>>>>>>>>>>> Yes. I've just moved this check from "if" statement,
>>>>>>>>>>>>>>>>>>>> since it
>>>>>>>>>>>>>>>>>>>> must be
>>>>>>>>>>>>>>>>>>>> true always.
>>>>>>>>>>>>>>>>>>> How do you think, do we need this assertion?
>>>>>>>>>>>>>>>>>> I don't really think it is necessary :)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I didn't present refactor patch, just because main patch
>>>>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>>> get LGTM community and hypothetically may contain some
>>>>>>>>>>>>>>>>>>> mistakes..
>>>>>>>>>>>>>>>>>> LGTM otherwise.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>> Manman
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Stepan
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi Stepan,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> One high-level question: do we need to insert the
>>>>>>>>>>>>>>>>>>>>> wasted
>>>>>>>>>>>>>>>>>>>>> record to
>>>>>>>>>>>>>>>>>>>>> ByValRegs and later on skip those records?
>>>>>>>>>>>>>>>>>>>>> It seems to me that we can only insert the records for
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> actual byval
>>>>>>>>>>>>>>>>>>>>> parameters.
>>>>>>>>>>>>>>>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>>>>>>>>>>>>>>>> skipWastedByValRegs.
>>>>>>>>>>>>>>>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> +        // "offset" value would be usefull if byval
>>>>>>>>>>>>>>>>>>>>>> parameter outsides
>>>>>>>>>>>>>>>>>>>>>> +        // the registers area. Note, in that case
>>>>>>>>>>>>>>>>>>>>>> nextInRegsParam()
>>>>>>>>>>>>>>>>>>>>>> will not
>>>>>>>>>>>>>>>>>>>>>> +        // skip any waste registers, since parameter
>>>>>>>>>>>>>>>>>>>>>> fillup
>>>>>>>>>>>>>>>>>>>>>> the whole
>>>>>>>>>>>>>>>>>>>>>> rest
>>>>>>>>>>>>>>>>>>>>>> +        // of registers set.
>>>>>>>>>>>>>>>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>>>>>>>>>>>>>>>> +        if (!CCInfo.nextInRegsParam() &&
>>>>>>>>>>>>>>>>>>>>>> Flags.getByValSize() >
>>>>>>>>>>>>>>>>>>>>>> 4*RegsUsed )
>>>>>>>>>>>>>>>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>>>>>>>>>>>>>>>> +      if (offset) {
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This still does not look right.
>>>>>>>>>>>>>>>>>>>>> There are 3 cases:
>>>>>>>>>>>>>>>>>>>>> If the byval parameter is in registers partially, we
>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>>>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>>>>>>>>>>>>>>>> If the byval parameter is all in registers, there is
>>>>>>>>>>>>>>>>>>>>> no need
>>>>>>>>>>>>>>>>>>>>> to create a
>>>>>>>>>>>>>>>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>>>>>>>>>>>>>>>> If the byval parameter is all on stack, we should
>>>>>>>>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>>>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> When looking at the above code, offset is default to
>>>>>>>>>>>>>>>>>>>>> zero, if
>>>>>>>>>>>>>>>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is
>>>>>>>>>>>>>>>>>>>>> all on
>>>>>>>>>>>>>>>>>>>>> stack),
>>>>>>>>>>>>>>>>>>>>> offset will still be zero,
>>>>>>>>>>>>>>>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>>>>>>>>>>>>>>>> The correct logic seems to be always updating offset
>>>>>>>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>>>>>>>> the if and
>>>>>>>>>>>>>>>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>>>>>>>>>>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>    offset = RegEnd - RegBegin; //number of registers
>>>>>>>>>>>>>>>>>>>>> occupied
>>>>>>>>>>>>>>>>>>>>> by this
>>>>>>>>>>>>>>>>>>>>> byval parameter
>>>>>>>>>>>>>>>>>>>>>    CCInfo.nextInRegsParam(); // go to the next byval
>>>>>>>>>>>>>>>>>>>>> parameter that is
>>>>>>>>>>>>>>>>>>>>> in registers partially or all
>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>>>>>>>>>>>>>>>    // Part of the byval parameter is on stack.
>>>>>>>>>>>>>>>>>>>>>    ...
>>>>>>>>>>>>>>>>>>>> Oh.. I missed third case. Then we keep first variant
>>>>>>>>>>>>>>>>>>>> here.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>>>>>>>>>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>>>>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>>>>>>>>>>> +    // All registers reserved for byval parameters
>>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>>> be allocated.
>>>>>>>>>>>>>>>>>>>>>> +    // Check that first unallocated register is
>>>>>>>>>>>>>>>>>>>>>> higher
>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> last one
>>>>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>>>>>>>>>>> RE);
>>>>>>>>>>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>>>>>>>>>>> Yes. I've just moved this check from "if" statement,
>>>>>>>>>>>>>>>>>>>> since it
>>>>>>>>>>>>>>>>>>>> must be
>>>>>>>>>>>>>>>>>>>> true always.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> -void
>>>>>>>>>>>>>>>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState
>>>>>>>>>>>>>>>>>>>>>> &CCInfo,
>>>>>>>>>>>>>>>>>>>>>> SelectionDAG
>>>>>>>>>>>>>>>>>>>>>> &DAG,
>>>>>>>>>>>>>>>>>>>>>> -                                        DebugLoc dl,
>>>>>>>>>>>>>>>>>>>>>> SDValue &Chain,
>>>>>>>>>>>>>>>>>>>>>> -                                        const Value
>>>>>>>>>>>>>>>>>>>>>> *OrigArg,
>>>>>>>>>>>>>>>>>>>>>> -                                        unsigned
>>>>>>>>>>>>>>>>>>>>>> OffsetFromOrigArg,
>>>>>>>>>>>>>>>>>>>>>> -                                        unsigned
>>>>>>>>>>>>>>>>>>>>>> ArgOffset,
>>>>>>>>>>>>>>>>>>>>>> -                                        bool
>>>>>>>>>>>>>>>>>>>>>> ForceMutable)
>>>>>>>>>>>>>>>>>>>>>> const {
>>>>>>>>>>>>>>>>>>>>>> +// Return: The frame index registers were stored
>>>>>>>>>>>>>>>>>>>>>> into.
>>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo,
>>>>>>>>>>>>>>>>>>>>>> SelectionDAG &DAG,
>>>>>>>>>>>>>>>>>>>>>> +                                  DebugLoc dl,
>>>>>>>>>>>>>>>>>>>>>> SDValue
>>>>>>>>>>>>>>>>>>>>>> &Chain,
>>>>>>>>>>>>>>>>>>>>>> +                                  const Value
>>>>>>>>>>>>>>>>>>>>>> *OrigArg,
>>>>>>>>>>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>>>>>>>>>>> InRegsParamRecordIdx,
>>>>>>>>>>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>>>>>>>>>>> OffsetFromOrigArg,
>>>>>>>>>>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>>>>>>>>>>> ArgOffset,
>>>>>>>>>>>>>>>>>>>>>> +                                  bool ForceMutable)
>>>>>>>>>>>>>>>>>>>>>> const {
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> It would be much cleaner if we can move the
>>>>>>>>>>>>>>>>>>>>> refactoring of
>>>>>>>>>>>>>>>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>>>>>>>>>>>>>>>> For sure.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> P.S.: Today I had some troubles with electricity, so
>>>>>>>>>>>>>>>>>>>> I'll try
>>>>>>>>>>>>>>>>>>>> present
>>>>>>>>>>>>>>>>>>>> new patch tomorrow.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> <pr15293-2013-04-12.patch>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> <AAPCS-C.4-vs-VFP-2013-04-17.patch>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> <AAPCS-C.4-vs-VFP-2013-04-19.patch>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> <StoreByValRegs-2013-04-17.patch>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> <StoreByValRegs-2013-04-28.patch>
>>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>
>>> <pr15293-2013-05-01.patch>
>>
>




More information about the llvm-commits mailing list