Fix PR15293: ARM codegen ice - expected larger existing stack allocation

Thu May 2 11:08:14 PDT 2013

Hi Manman,

> +  // NSAA should be evaluted (NSAA means "next stacked argument address").
> typo "evaluated"
>
> +  // So: NextStackOffset = NSAAOffset + SizeOfByValParamsStoredInRegs.
> +  // Then: NSAAOffset = NextStackOffset - SizeOfByValParamsStoredInRegs.
> +  unsigned NSAAOffset = State->getNextStackOffset();
> +  if (State->getCallOrPrologue() != Call) {
> +    for (unsigned i = 0, e = State->getInRegsParamsCount(); i != e; ++i) {
> +      unsigned RB, RE;
> +      State->getInRegsParamInfo(i, RB, RE);
> +      assert(NSAAOffset >= (RE-RB)*4 &&
> +             "Stack offset for byval regs doesn't introduced anymore?");
> +      NSAAOffset -= (RE-RB)*4;
> +    }
> +  }
> +
> +  if ((!Subtarget->isAAPCS_ABI() || NSAAOffset == 0) &&
> I don't quite get the above section to update NSAAOffset. Could you give an example?

Current implementation (even before all my patches) allocates stack area 
for byval registers.
Then, in prologue all byval parameters are stored on stack (currently 
single byval parameter) and used as params passed by pointer. I'm not 
author of this approach. There are both pros and cons. Main "pros": this 
behaviour relaxes registers pressure.

That kind of stack usage outsides AAPCS rules. AAPCS assumes that if 
parameter assigned to the registers only it sidesteps stack usage at all.

Example:

typedef struct { int field0; } struct1_t;
typedef struct { int fields[5]; } struct2_t;

foo (int a, struct_t b, int c, int d)

Reason I've added these changes is that actually NSAA from AAPCS rules 
is *not equal* to LLVM's CCState::StackOffset+SP:
int a      --> r0      NSAA is SP+0, StackOffset is 0
struct_t b --> r1, r2  NSAA is SP+0, StackOffset := 4 (!)
int c      --> r3      NSAA is SP+0, StackOffset is 4
int d      --> SP,     NSAA := SP+4, StackOffset := 8

Let NSAAOffset = NSAA-SP, then:
StackOffset = NSAAOffset + sizeof(all-by-val-regs)

The check:
"if ((!Subtarget->isAAPCS_ABI() || NSAAOffset == 0)"
is just improvement of what was before:
"(!Subtarget->isAAPCS_ABI() || State->getNextStackOffset() == 0)"

> Should we use up all GPRArgRegs when NSAAOffset != 0 so arguments after the byval will not be allocated to registers?
If I've understood question properly:
If NSAA != SP and parameter size greater than all remained GPRs we send 
it to stack and set NCRN to R4 (so we can't use GPRs anymore).
If NSAA != SP, but parameter could be saved in remained GPRs we follow 
C.4 rule and save this parameter in GPRs, increasing NCRN.

I attached examples as separated .ll files.

P.S.: While writing this post I've found that I did the mistake in 
HandleByVal, I changed it a bit more. Would you please look at it again. 
So've I reattached the patch too.

-Stepan

>
> Otherwise LGTM.
>
> Thanks for working on this,
> Manman
>
> On May 1, 2013, at 11:36 AM, Stepan Dyatkovskiy wrote:
>
>> Hello,
>> Please find patch in attachment that fixes byval support in ARM backend for llvm level.
>>
>> -Stepan.
>>
>> Stepan Dyatkovskiy wrote:
>>> Commited in r180774. Now I'm working on milestone patch, that fixes
>>> byval support for llvm level. I published it before, now I need to
>>> update it. Hope it come soon.
>>>
>>> -Stepan.
>>>
>>> Manman Ren wrote:
>>>>
>>>> LGTM,
>>>>
>>>> Thanks,
>>>> Manman
>>>>
>>>> On Apr 28, 2013, at 10:50 AM, Stepan Dyatkovskiy wrote:
>>>>
>>>>> Hi Manman,
>>>>> Just for keeping things actual: I reattached patch with new names.
>>>>> -Stepan
>>>>>
>>>>> Stepan Dyatkovskiy wrote:
>>>>>> Hi Manman,
>>>>>>
>>>>>>> Since it represents the stack area needed to save parameters that are
>>>>>> actually passed in registers, how about RegsSaveSizeForParams or just
>>>>>> RegsSaveSize?
>>>>>>
>>>>>> In AAPCS R0-R3 are called like:
>>>>>> "Argument / result / scratch register".
>>>>>> So maybe we'll use "set/getArgRegsSaveSize" names?
>>>>>>
>>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>> AFI->getByValRegsSaveSize();"
>>>>>>
>>>>>> Yea :-) Here replace "VAReg" with "ArgRegs": ArgRegsSaveSize,
>>>>>> ArgRegsSize.
>>>>>>
>>>>>> -Stepan
>>>>>>
>>>>>> Manman Ren wrote:
>>>>>>>
>>>>>>> Hi Stepan,
>>>>>>>
>>>>>>> Thanks for working on this.
>>>>>>>
>>>>>>> I put in a comment about the refactor patch sometime ago:
>>>>>>>>> I am not sure about the renaming patch though. Functions
>>>>>>>>> get|setVarArgsRegSaveSize are used for both vararg and byval.
>>>>>>>>> It does not sound right to replace VarARgs with ByVal.
>>>>>>>>
>>>>>>>> Actually it is used not for vararg function, as I found it main role
>>>>>>>> of this param in emitPrologue, it is used for proper stack
>>>>>>>> allocation.
>>>>>>>> Stack size should enough for storing all registers, used for byval
>>>>>>>> parameters. And yes, if it is VA function, we must also extend this
>>>>>>>> stack area for all unallocated R0-R3 regs storage.
>>>>>>>
>>>>>>> Agree. That is why I think ByVal is not a great name either.
>>>>>>> Since it represents the stack area needed to save parameters that are
>>>>>>> actually passed in registers, how about RegsSaveSizeForParams or just
>>>>>>> RegsSaveSize?
>>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>>> AFI->getByValRegsSaveSize();" where we assign return value of
>>>>>>> getByValRegsSaveSize to a variable "VARegSaveSize".
>>>>>>>
>>>>>>> Manman
>>>>>>>
>>>>>>> On Apr 26, 2013, at 11:12 AM, Stepan Dyatkovskiy wrote:
>>>>>>>
>>>>>>>> Hi Manman and Renato,
>>>>>>>> Patch was tested with test-suite all test have bee passed.
>>>>>>>> -Stepan
>>>>>>>>
>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> AAPCS, 5.5 Parameters Passing, Stage C, C4 and C5 have been fixed,
>>>>>>>>> r180011.
>>>>>>>>>
>>>>>>>>> So, next patch is in attachment: VarArgsStyleRegisters refactoring.
>>>>>>>>>
>>>>>>>>> -Stepan.
>>>>>>>>>
>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>> Hello, Manman.
>>>>>>>>>> That's kind of snowball!
>>>>>>>>>> Today test-suite finished with two new failures on my beagle.
>>>>>>>>>> It helped me find out one more hidden logic dependency in ARM
>>>>>>>>>> backend.
>>>>>>>>>>
>>>>>>>>>> May be its not a mistake, but look:
>>>>>>>>>>
>>>>>>>>>> In ARMCallingConv.h, "f64AssignAAPCS" allocates registers for f64.
>>>>>>>>>> This
>>>>>>>>>> method is called for VA functions, since there is no CPRCs, and
>>>>>>>>>> *any*
>>>>>>>>>> parameters could be allocated in R0-R3. Either R0,R1 or R2,R3
>>>>>>>>>> could be
>>>>>>>>>> allocated here. If all the registers except R3 were allocated
>>>>>>>>>> somewhere
>>>>>>>>>> before, f64AssignAAPCS just moves whole f64 to stack, it doesn't
>>>>>>>>>> waste
>>>>>>>>>> R3, so R3 could be allocated later. But logic seems to be same
>>>>>>>>>> as in
>>>>>>>>>> ARMTargetLowering::HandleByVal.
>>>>>>>>>>
>>>>>>>>>> My main fix in calling convention was:
>>>>>>>>>>
>>>>>>>>>> def CC_ARM_AAPCS_Common : CallingConv<[
>>>>>>>>>> ...
>>>>>>>>>> -  CCIfType<[i32], CCIf<"State.getNextStackOffset() == 0 &&"
>>>>>>>>>> -                       "ArgFlags.getOrigAlign() != 8",
>>>>>>>>>> +  CCIfType<[i32], CCIf<"ArgFlags.getOrigAlign() != 8",
>>>>>>>>>>                          CCAssignToReg<[R0, R1, R2, R3]>>>,
>>>>>>>>>>
>>>>>>>>>> Now I got some picture why "State.getNextStackOffset() == 0" was
>>>>>>>>>> added:
>>>>>>>>>> due to f64AssignAAPCS implementation. May be the are more reasons:
>>>>>>>>>> next
>>>>>>>>>> test-suite results will show it then :-)
>>>>>>>>>>
>>>>>>>>>> Anyway, original CCIfType brakes C.4 and C.5, while my fix
>>>>>>>>>> brakes some
>>>>>>>>>> implicit statement defined in f64AssignAAPCS:
>>>>>>>>>>
>>>>>>>>>> "stack-offset != 0 doesn't mean all registers are allocated"
>>>>>>>>>>
>>>>>>>>>> So I've fixed f64AssignAAPCS too. Once we waste registers in
>>>>>>>>>> HandleByVal
>>>>>>>>>> (when we send parameter to stack), I think it would be good to keep
>>>>>>>>>> same
>>>>>>>>>> behaviour for f64AssignAAPCS since its HandleByVal's analogue
>>>>>>>>>> for f64
>>>>>>>>>> params.
>>>>>>>>>>
>>>>>>>>>> I attached new patch, now with 3 tests :-)
>>>>>>>>>> (3rd test is 2013-04-21-AAPCS-VA-C.1.cp.ll)
>>>>>>>>>>
>>>>>>>>>> -Stepan.
>>>>>>>>>>
>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>
>>>>>>>>>>> LGTM,
>>>>>>>>>>>
>>>>>>>>>>> Thanks a lot,
>>>>>>>>>>> Manman
>>>>>>>>>>>
>>>>>>>>>>> On Apr 18, 2013, at 8:41 PM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Fixed :-) New patch (+2 tests) is attached.
>>>>>>>>>>>> -Stepan
>>>>>>>>>>>>
>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> "AAPCS-C.4-vs-VFP-2013-04-17.patch" looks good to me.
>>>>>>>>>>>>>
>>>>>>>>>>>>> As a side note, we may have issues with 9 floats, 3 ints and a
>>>>>>>>>>>>> small
>>>>>>>>>>>>> byval.
>>>>>>>>>>>>> The standard says we should split the byval only when NSAA is
>>>>>>>>>>>>> equal
>>>>>>>>>>>>> to SP (C.5), in the case of 9 floats, 3 ints and a small byval,
>>>>>>>>>>>>> we should not split the byval since we advanced NSAA with the
>>>>>>>>>>>>> 9th
>>>>>>>>>>>>> float.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Manman
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Apr 16, 2013, at 8:51 PM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Manman,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> AAPCS-C.4-vs-VFP-2013-04-16.patch looks good except that the
>>>>>>>>>>>>>>> testing case.
>>>>>>>>>>>>>>> Should it have 9 doubles and 1 integer instead of a single
>>>>>>>>>>>>>>> double
>>>>>>>>>>>>>>> and a single integer?
>>>>>>>>>>>>>>> The testing case passes even without the fix.
>>>>>>>>>>>>>> That was my midnight mistake, sorry.
>>>>>>>>>>>>>> I reattached patch with proper test. In the end of test
>>>>>>>>>>>>>> file, I've
>>>>>>>>>>>>>> added comments for @doFoo: small description of how parameters
>>>>>>>>>>>>>> must
>>>>>>>>>>>>>> be passed:
>>>>>>>>>>>>>> first eight doubles --> D0-D7
>>>>>>>>>>>>>> 9th double --> Stack
>>>>>>>>>>>>>> 10th param (first i32) --> R0.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> It is important to have 9 double params here, since we must
>>>>>>>>>>>>>> fill-up
>>>>>>>>>>>>>> all possible VFPs, then get compiler to use stack offset,
>>>>>>>>>>>>>> and only
>>>>>>>>>>>>>> after start use GPRs.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am not sure about the renaming patch though. Functions
>>>>>>>>>>>>>>> get|setVarArgsRegSaveSize are used for both vararg and byval.
>>>>>>>>>>>>>>> It does not sound right to replace VarARgs with ByVal.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Actually it is used not for vararg function, as I found it main
>>>>>>>>>>>>>> role of this param in emitPrologue, it is used for proper stack
>>>>>>>>>>>>>> allocation.
>>>>>>>>>>>>>> Stack size should enough for storing all registers, used for
>>>>>>>>>>>>>> byval
>>>>>>>>>>>>>> parameters. And yes, if it is VA function, we must also extend
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>> stack area for all unallocated R0-R3 regs storage.
>>>>>>>>>>>>> Agree. That is why I think ByVal is not a great name either.
>>>>>>>>>>>>> Since it represents the stack area needed to save parameters
>>>>>>>>>>>>> that
>>>>>>>>>>>>> are actually passed in registers, how about
>>>>>>>>>>>>> RegsSaveSizeForParams or
>>>>>>>>>>>>> just RegsSaveSize?
>>>>>>>>>>>>> Also it is weird to have "unsigned VARegSaveSize =
>>>>>>>>>>>>> AFI->getByValRegsSaveSize();" where we assign return value of
>>>>>>>>>>>>> getByValRegsSaveSize to a variable "VARegSaveSize".
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks for working on this.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for working on this.
>>>>>>>>>>>>>>> Manman
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Apr 15, 2013, at 9:23 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Manman,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>>>>>>>>> Can we fix the above issue in a separate patch with
>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>> testing case?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Well.. We can use next case: Co-Processor register candidates
>>>>>>>>>>>>>>>> may be
>>>>>>>>>>>>>>>> either in VFP or in stack, so when all VFP are allocated
>>>>>>>>>>>>>>>> stack is
>>>>>>>>>>>>>>>> used. We can use stack without GPR allocation in that case,
>>>>>>>>>>>>>>>> passing 9 f64 params, for example: first eight goes to d0-d7,
>>>>>>>>>>>>>>>> night goes to stack. Now as 10th parameter we pass i32,
>>>>>>>>>>>>>>>> according
>>>>>>>>>>>>>>>> to AAPCS, 5.5 Parameters Passing, Stage C, C.4:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> [quote]
>>>>>>>>>>>>>>>> If the size in words of the argument is not more than r4
>>>>>>>>>>>>>>>> minus
>>>>>>>>>>>>>>>> NCRN, the argument is copied into
>>>>>>>>>>>>>>>> core registers, starting at the NCRN. The NCRN is
>>>>>>>>>>>>>>>> incremented by
>>>>>>>>>>>>>>>> the number of registers used.
>>>>>>>>>>>>>>>> Successive registers hold the parts of the argument they
>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> hold if its value were loaded into
>>>>>>>>>>>>>>>> those registers from memory using an LDM instruction. The
>>>>>>>>>>>>>>>> argument has now been allocated.
>>>>>>>>>>>>>>>> [/quote]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> It must to R0 in our case. But currently it goes to stack.
>>>>>>>>>>>>>>>> I also checked this case with GCC - it works as in AAPCS
>>>>>>>>>>>>>>>> written.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> About refactoring. I propose do it in two stages, first
>>>>>>>>>>>>>>>> rename
>>>>>>>>>>>>>>>> "getVarArgsRegSaveSize" to "get/setByValRegsSaveSize", and
>>>>>>>>>>>>>>>> detach
>>>>>>>>>>>>>>>> "StoreByValRegs" functionality from "VarArgStyleRegisters"
>>>>>>>>>>>>>>>> method.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I attached first refactor patch, Calling-Convention patch
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> test-case, and calling-convention .c test case.
>>>>>>>>>>>>>>>> Try run .c file with "arm-linux-gnueabihf-gcc" and
>>>>>>>>>>>>>>>> "./arm-linux-gnueabihf-clang -mcpu=cortex-a9".
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> P.S.: My beagleboard is very very slow, "release" build takes
>>>>>>>>>>>>>>>> about 15 hours, test-suite may be 15-20 hours too. I'll
>>>>>>>>>>>>>>>> try ask
>>>>>>>>>>>>>>>> somebody to present me faster device, but it take some time.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Apr 12, 2013, at 5:15 AM, Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hello Manman,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I reattached full patch with fixes.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1. I've removedskipWastedByValRegs, addByValWastedRegs and
>>>>>>>>>>>>>>>>>> Wasted flag.
>>>>>>>>>>>>>>>>>> 2. Fixed usage of "offset" variable in LowerFormalArguments
>>>>>>>>>>>>>>>>>> 3. And fixed regression test: added function with two byval
>>>>>>>>>>>>>>>>>> structures.
>>>>>>>>>>>>>>>>> The above looks good.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> *4.* One new change I've made is
>>>>>>>>>>>>>>>>>> CC_ARM_AAPCS_Common rule.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Currently, for align == 4 and parameter type <=i32, we
>>>>>>>>>>>>>>>>>> try to
>>>>>>>>>>>>>>>>>> assign it to the first unallocated register from [r0-r3]
>>>>>>>>>>>>>>>>>> range,
>>>>>>>>>>>>>>>>>> but only if getNextStackOffset() == 0.
>>>>>>>>>>>>>>>>>> I suppose that "getNextStackOffset() != 0" is treated as
>>>>>>>>>>>>>>>>>> fact?
>>>>>>>>>>>>>>>>>> that all registers are used? and params goes to the
>>>>>>>>>>>>>>>>>> stack now.
>>>>>>>>>>>>>>>>>> After byval structures will be implemented, the next
>>>>>>>>>>>>>>>>>> case is
>>>>>>>>>>>>>>>>>> possible:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> void @f(%i32_struct* byval, i32)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Here, we need to allocate stack for first parameter, but
>>>>>>>>>>>>>>>>>> after
>>>>>>>>>>>>>>>>>> we still have unallocated registers. And second parameter
>>>>>>>>>>>>>>>>>> should be sent to r1. Current CC_ARM_AAPCS_Common
>>>>>>>>>>>>>>>>>> implementation will send second parameter to the stack.
>>>>>>>>>>>>>>>>>> Though, I'm still not sure about this change, just
>>>>>>>>>>>>>>>>>> because may
>>>>>>>>>>>>>>>>>> be there are more reasons to check nextStackOffset value.
>>>>>>>>>>>>>>>>> Can we fix the above issue in a separate patch with
>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>> testing case?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>>>>>>>>>> RE);
>>>>>>>>>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>>>>>>>>>> Yes. I've just moved this check from "if" statement,
>>>>>>>>>>>>>>>>>>> since it
>>>>>>>>>>>>>>>>>>> must be
>>>>>>>>>>>>>>>>>>> true always.
>>>>>>>>>>>>>>>>>> How do you think, do we need this assertion?
>>>>>>>>>>>>>>>>> I don't really think it is necessary :)
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I didn't present refactor patch, just because main patch
>>>>>>>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>>>>> get LGTM community and hypothetically may contain some
>>>>>>>>>>>>>>>>>> mistakes..
>>>>>>>>>>>>>>>>> LGTM otherwise.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>> Manman
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> -Stepan
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Stepan Dyatkovskiy wrote:
>>>>>>>>>>>>>>>>>>> Manman Ren wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi Stepan,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> One high-level question: do we need to insert the wasted
>>>>>>>>>>>>>>>>>>>> record to
>>>>>>>>>>>>>>>>>>>> ByValRegs and later on skip those records?
>>>>>>>>>>>>>>>>>>>> It seems to me that we can only insert the records for
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> actual byval
>>>>>>>>>>>>>>>>>>>> parameters.
>>>>>>>>>>>>>>>>>>>> If that is true, we can remove addByValWastedRegs and
>>>>>>>>>>>>>>>>>>>> skipWastedByValRegs.
>>>>>>>>>>>>>>>>>>> Hm.. yes its reasonable. I'll try to fix.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> +        // "offset" value would be usefull if byval
>>>>>>>>>>>>>>>>>>>>> parameter outsides
>>>>>>>>>>>>>>>>>>>>> +        // the registers area. Note, in that case
>>>>>>>>>>>>>>>>>>>>> nextInRegsParam()
>>>>>>>>>>>>>>>>>>>>> will not
>>>>>>>>>>>>>>>>>>>>> +        // skip any waste registers, since parameter
>>>>>>>>>>>>>>>>>>>>> fillup
>>>>>>>>>>>>>>>>>>>>> the whole
>>>>>>>>>>>>>>>>>>>>> rest
>>>>>>>>>>>>>>>>>>>>> +        // of registers set.
>>>>>>>>>>>>>>>>>>>>> +        unsigned RegsUsed = RegEnd - RegBegin;
>>>>>>>>>>>>>>>>>>>>> +        if (!CCInfo.nextInRegsParam() &&
>>>>>>>>>>>>>>>>>>>>> Flags.getByValSize() >
>>>>>>>>>>>>>>>>>>>>> 4*RegsUsed )
>>>>>>>>>>>>>>>>>>>>> +          offset = RegEnd - RegBegin;
>>>>>>>>>>>>>>>>>>>>>        }
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -      if (Flags.getByValSize() - 4*offset > 0) {
>>>>>>>>>>>>>>>>>>>>> +      if (offset) {
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> This still does not look right.
>>>>>>>>>>>>>>>>>>>> There are 3 cases:
>>>>>>>>>>>>>>>>>>>> If the byval parameter is in registers partially, we
>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>>>>>>>> COPY_STRUCT_BYVAL with the correct offset;
>>>>>>>>>>>>>>>>>>>> If the byval parameter is all in registers, there is
>>>>>>>>>>>>>>>>>>>> no need
>>>>>>>>>>>>>>>>>>>> to create a
>>>>>>>>>>>>>>>>>>>> COPY_STRUCT_BYVAL;
>>>>>>>>>>>>>>>>>>>> If the byval parameter is all on stack, we should
>>>>>>>>>>>>>>>>>>>> create a
>>>>>>>>>>>>>>>>>>>> COPY_STRUCT_BYVAL with offset equal to 0.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> When looking at the above code, offset is default to
>>>>>>>>>>>>>>>>>>>> zero, if
>>>>>>>>>>>>>>>>>>>> CurByValIdx >= ByValArgsCount (this byval parameter is
>>>>>>>>>>>>>>>>>>>> all on
>>>>>>>>>>>>>>>>>>>> stack),
>>>>>>>>>>>>>>>>>>>> offset will still be zero,
>>>>>>>>>>>>>>>>>>>> and we will not create a COPY_STRUCT_BYVAL.
>>>>>>>>>>>>>>>>>>>> The correct logic seems to be always updating offset
>>>>>>>>>>>>>>>>>>>> inside
>>>>>>>>>>>>>>>>>>>> the if and
>>>>>>>>>>>>>>>>>>>> check Flags.getByValSize() > 4*offset
>>>>>>>>>>>>>>>>>>>> if (CurByValIdx < ByValArgsCount) {
>>>>>>>>>>>>>>>>>>>>    …
>>>>>>>>>>>>>>>>>>>>    offset = RegEnd - RegBegin; //number of registers
>>>>>>>>>>>>>>>>>>>> occupied
>>>>>>>>>>>>>>>>>>>> by this
>>>>>>>>>>>>>>>>>>>> byval parameter
>>>>>>>>>>>>>>>>>>>>    CCInfo.nextInRegsParam(); // go to the next byval
>>>>>>>>>>>>>>>>>>>> parameter that is
>>>>>>>>>>>>>>>>>>>> in registers partially or all
>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>> if (Flags.getByValSize() > 4*offset) {
>>>>>>>>>>>>>>>>>>>>    // Part of the byval parameter is on stack.
>>>>>>>>>>>>>>>>>>>>    ...
>>>>>>>>>>>>>>>>>>> Oh.. I missed third case. Then we keep first variant here.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -  if ((!State->isFirstByValRegValid()) &&
>>>>>>>>>>>>>>>>>>>>> -      (ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>>>>>>>>>> +  if ((ARM::R0 <= reg) && (reg <= ARM::R3)) {
>>>>>>>>>>>>>>>>>>>>> +    // All registers reserved for byval parameters
>>>>>>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>>>>>>> be allocated.
>>>>>>>>>>>>>>>>>>>>> +    // Check that first unallocated register is higher
>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>> last one
>>>>>>>>>>>>>>>>>>>>> used
>>>>>>>>>>>>>>>>>>>>> +    // for byval parameters.
>>>>>>>>>>>>>>>>>>>>> +    {
>>>>>>>>>>>>>>>>>>>>> +      unsigned InRegsParamsCount =
>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamsCount();
>>>>>>>>>>>>>>>>>>>>> +      if (InRegsParamsCount) {
>>>>>>>>>>>>>>>>>>>>> +        unsigned RB, RE;
>>>>>>>>>>>>>>>>>>>>> +
>>>>>>>>>>>>>>>>>>>>> State->getInRegsParamInfo(InRegsParamsCount-1, RB,
>>>>>>>>>>>>>>>>>>>>> RE);
>>>>>>>>>>>>>>>>>>>>> +        assert(reg >= RE);
>>>>>>>>>>>>>>>>>>>>> +      }
>>>>>>>>>>>>>>>>>>>>> +    }
>>>>>>>>>>>>>>>>>>>> Is this only for assertion? Is this necessary?
>>>>>>>>>>>>>>>>>>> Yes. I've just moved this check from "if" statement,
>>>>>>>>>>>>>>>>>>> since it
>>>>>>>>>>>>>>>>>>> must be
>>>>>>>>>>>>>>>>>>> true always.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -void
>>>>>>>>>>>>>>>>>>>>> -ARMTargetLowering::VarArgStyleRegisters(CCState
>>>>>>>>>>>>>>>>>>>>> &CCInfo,
>>>>>>>>>>>>>>>>>>>>> SelectionDAG
>>>>>>>>>>>>>>>>>>>>> &DAG,
>>>>>>>>>>>>>>>>>>>>> -                                        DebugLoc dl,
>>>>>>>>>>>>>>>>>>>>> SDValue &Chain,
>>>>>>>>>>>>>>>>>>>>> -                                        const Value
>>>>>>>>>>>>>>>>>>>>> *OrigArg,
>>>>>>>>>>>>>>>>>>>>> -                                        unsigned
>>>>>>>>>>>>>>>>>>>>> OffsetFromOrigArg,
>>>>>>>>>>>>>>>>>>>>> -                                        unsigned
>>>>>>>>>>>>>>>>>>>>> ArgOffset,
>>>>>>>>>>>>>>>>>>>>> -                                        bool
>>>>>>>>>>>>>>>>>>>>> ForceMutable)
>>>>>>>>>>>>>>>>>>>>> const {
>>>>>>>>>>>>>>>>>>>>> +// Return: The frame index registers were stored into.
>>>>>>>>>>>>>>>>>>>>> +int
>>>>>>>>>>>>>>>>>>>>> +ARMTargetLowering::StoreByValRegs(CCState &CCInfo,
>>>>>>>>>>>>>>>>>>>>> SelectionDAG &DAG,
>>>>>>>>>>>>>>>>>>>>> +                                  DebugLoc dl, SDValue
>>>>>>>>>>>>>>>>>>>>> &Chain,
>>>>>>>>>>>>>>>>>>>>> +                                  const Value *OrigArg,
>>>>>>>>>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>>>>>>>>>> InRegsParamRecordIdx,
>>>>>>>>>>>>>>>>>>>>> +                                  unsigned
>>>>>>>>>>>>>>>>>>>>> OffsetFromOrigArg,
>>>>>>>>>>>>>>>>>>>>> +                                  unsigned ArgOffset,
>>>>>>>>>>>>>>>>>>>>> +                                  bool ForceMutable)
>>>>>>>>>>>>>>>>>>>>> const {
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> It would be much cleaner if we can move the
>>>>>>>>>>>>>>>>>>>> refactoring of
>>>>>>>>>>>>>>>>>>>> VarArgStyleRegisters to a separate patch.
>>>>>>>>>>>>>>>>>>> For sure.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> P.S.: Today I had some troubles with electricity, so
>>>>>>>>>>>>>>>>>>> I'll try
>>>>>>>>>>>>>>>>>>> present
>>>>>>>>>>>>>>>>>>> new patch tomorrow.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Stepan.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> <pr15293-2013-04-12.patch>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> <stack-offset.c><VarArgsRegSaveSize-2013-04-16.patch><AAPCS-C.4-vs-VFP-2013-04-16.patch>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> <AAPCS-C.4-vs-VFP-2013-04-17.patch>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> <AAPCS-C.4-vs-VFP-2013-04-19.patch>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>> <StoreByValRegs-2013-04-17.patch>
>>>>>>>
>>>>>>
>>>>>
>>>>> <StoreByValRegs-2013-04-28.patch>
>>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>> <pr15293-2013-05-01.patch>
>

-------------- next part --------------
;Check AAPCS, 5.5 Parameters Passing, C4 and C5 rules.
;Check case when NSAA != 0, and NCRN < R4, NCRN+ParamSize < R4
;RUN: llc -mtriple=thumbv7-linux-gnueabihf -float-abi=hard < %s | FileCheck %s

%st_t = type { i32, i32 }
@static_val = constant %st_t { i32 777, i32 888}

declare void @fooUseStruct(%st_t*)

define void @foo(double %vfp0,     ; --> D0,     NSAA=SP
                 double %vfp1,     ; --> D1,     NSAA=SP
		 double %vfp2,     ; --> D2,     NSAA=SP
		 double %vfp3,     ; --> D3,     NSAA=SP
		 double %vfp4,     ; --> D4,     NSAA=SP
		 double %vfp5,     ; --> D5,     NSAA=SP
		 double %vfp6,     ; --> D6,     NSAA=SP
		 double %vfp7,     ; --> D7,     NSAA=SP
		 double %vfp8,     ; --> SP,     NSAA=SP+8 (!)
                 i32 %p0,          ; --> R0,     NSAA=SP+8 
		 %st_t* byval %p1, ; --> R1, R2, NSAA=SP+8
		 i32 %p2,          ; --> R3,     NSAA=SP+8 
                 i32 %p3) #0 {     ; --> SP+4,   NSAA=SP+12
entry:
  ;CHECK: sub sp, #8
  ;CHECK: push.w {r11, lr}
  ;CHECK: add r0, sp, #16
  ;CHECK: str r2, [sp, #20]
  ;CHECK: str r1, [sp, #16]
  ;CHECK: bl  fooUseStruct
  call void @fooUseStruct(%st_t* %p1)
  ret void
}

define void @doFoo() {
entry:
  call void @foo(double 23.0,
                 double 23.1,
                 double 23.2,
                 double 23.3,
                 double 23.4,
                 double 23.5,
                 double 23.6,
                 double 23.7,
                 double 23.8,
                 i32 0, %st_t* byval @static_val, i32 1, i32 2)
  ret void
}

-------------- next part --------------
;Check AAPCS, 5.5 Parameters Passing, C4 and C5 rules.
;Check case when NSAA != 0, and NCRN < R4, NCRN+ParamSize > R4
;RUN: llc -mtriple=thumbv7-linux-gnueabihf -float-abi=hard < %s | FileCheck %s

%st_t = type { i32, i32, i32, i32 }
@static_val = constant %st_t { i32 777, i32 888, i32 787, i32 878}

define void @foo(double %vfp0,     ; --> D0,              NSAA=SP
                 double %vfp1,     ; --> D1,              NSAA=SP
		 double %vfp2,     ; --> D2,              NSAA=SP
		 double %vfp3,     ; --> D3,              NSAA=SP
		 double %vfp4,     ; --> D4,              NSAA=SP
		 double %vfp5,     ; --> D5,              NSAA=SP
		 double %vfp6,     ; --> D6,              NSAA=SP
		 double %vfp7,     ; --> D7,              NSAA=SP
		 double %vfp8,     ; --> SP,              NSAA=SP+8 (!)
                 i32 %p0,          ; --> R0,              NSAA=SP+8 
		 %st_t* byval %p1, ; --> SP+8, 4 words    NSAA=SP+24
		 i32 %p2) #0 {     ; --> SP+24,           NSAA=SP+24 

entry:
  ;CHECK:  push.w {r11, lr}
  ;CHECK:  ldr    r0, [sp, #32]
  ;CHECK:  bl     fooUseI32
  call void @fooUseI32(i32 %p2)
  ret void
}

declare void @fooUseI32(i32)

define void @doFoo() {
entry:
  call void @foo(double 23.0,
                 double 23.1,
                 double 23.2,
                 double 23.3,
                 double 23.4,
                 double 23.5,
                 double 23.6,
                 double 23.7,
                 double 23.8,
                 i32 0, %st_t* byval @static_val, i32 1)
  ret void
}

-------------- next part --------------
;Check AAPCS, 5.5 Parameters Passing, C4 and C5 rules.
;RUN: llc -mtriple=thumbv7-linux-gnueabihf -float-abi=hard < %s | FileCheck %s

%artz = type { i32, i32 }
@static_val = constant %artz { i32 777, i32 888}

define void @foo(i32 %p0,          ; --> R0,     NSAA=SP+0 
		 %artz* byval %p1, ; --> R1, R2, NSAA=SP+0
		 i32 %p2,          ; --> R3,     NSAA=SP+0 
                 i32 %p3) #0 {     ; --> SP,     NSAA=SP+4
entry:

  ;CHECK: sub     sp, #8
  ;CHECK: push.w  {r11, lr}
  ;CHECK: mov     r0, r3
  ;CHECK: str     r2, [sp, #12]
  ;CHECK: str     r1, [sp, #8]
  ;CHECK: bl      fooUseI32
  call void @fooUseI32(i32 %p2)
  ret void
}

declare void @fooUseI32(i32)

define void @doFoo() {
entry:
  call void @foo(i32 0, %artz* byval @static_val, i32 1, i32 2)
  ret void
}

-------------- next part --------------
A non-text attachment was scrubbed...
Name: pr15293-2013-05-02.patch
Type: text/x-diff
Size: 22665 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130502/65e6469f/attachment.patch>