ARM, PR15868 Release-blocker fix. ByVal parameters padding fix.

Manman Ren mren at apple.com
Fri May 17 13:37:32 PDT 2013


LGTM,

Thanks,
Manman

On May 17, 2013, at 1:17 PM, Stepan Dyatkovskiy <stpworld at narod.ru> wrote:

> Hi Manman,
> I've attached new patch:
> -- Pasted your implementation in "if" statement.
> -- Kept Align parameter in getArgRegsSaveSize method.
> -- New test-case ("see 2013-05-13-AAPCS-byval-padding2.ll") that checks
> case, you mentioned in previous post.
> 
> -Stepan.
> 
> Stepan Dyatkovskiy wrote:
>> Hi Manman,
>> 
>> >> Then in emitPrologue/Epilogue we just replace current
>> getArgRegsSaveSize with getArgRegsSaveSize(Align).
>> > This may not work, for our example of a 4-byte byval followed by a
>> 20-byte byval, computeRegArea for the second byval will return 16 as
>> ArgRegsSaveSize, and the padding will be 4.
>> 
>> Ah, yes, exactly. Need one more test-case then :-)
>> 
>> > So in if statement, instead of aligning ArgRegsSize of the split
>> byval parameter, we align the total ArgRegsSize,
>> > Padding = (ArgRegsSize + AFI->getArgRegsSaveSize + Align - 1) &
>> ~(Align-1) - (ArgRegsSize + AFI->getArgRegsSaveSize)
>> > ArgRegsSaveSize = ArgRegsSize + Padding.
>> >
>> > For several 4 bytes byval params, since they are not split, the
>> padding for each will be zero.
>> 
>> Then we can use this implementation, and add Align parameter to
>> getArgRegsSaveSize. It solves problem you've mentioned for odd amount of
>> 4 bytes params.
>> 
>> -Stepan
>> 
>> Manman Ren wrote:
>>> 
>>> On May 15, 2013, at 3:07 AM, Stepan Dyatkovskiy wrote:
>>> 
>>>> Hi Manman,
>>>> 
>>>>> For functions called from check227, SP will not be 8-byte aligned.
>>>>> Is this considered "public interfaces"?
>>>> Oh.. you right. If we have the only 4 bytes byval parameter, then in
>>>> my case "sub sp, sp, #4" will be emitted. That's wrong.
>>>> 
>>>>> I think the fix is to check AFI->getArgRegsSaveSize inside
>>>>> computeRegArea and add padding if (ArgRegsSize +
>>>>> AFI->getArgRegsSaveSize) is not correctly aligned.
>>>> 
>>>> For several 4 bytes byval params we get several 8 bytes increments
>>>> then. So for 3 x 4-byte-param we would get 24 bytes stack area.
>>> Hi Stepan,
>>> 
>>> I meant to add padding only for the single byval parameter that is
>>> split, and the amount of padding depends on the total ArgRegsSize.
>>> Given in the patch:
>>> -  ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>> +
>>> +  // If parameter is split between stack and GPRs...
>>> +  if (NumGPRs && Align == 8 &&
>>> +      (ArgRegsSize < ArgSize ||
>>> +        InRegsParamRecordIdx >= CCInfo.getInRegsParamsCount())) {
>>> +    // Add padding for part of param recovered from GPRs, so
>>> +    // its last byte must be at address K*8 - 1.
>>> +    // We need to do it, since remained (stack) part of parameter has
>>> +    // stack alignment, and we need to "attach" "GPRs head" without gaps
>>> +    // to it:
>>> +    // Stack:
>>> +    // |---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8
>>> bytes...
>>> +    // [ [padding] [GPRs head] ] [        Tail passed via stack
>>> ....
>>> +    //
>>> +    ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>> +  } else
>>> +    // We don't need to extend regs save size for byval parameters if
>>> they
>>> +    // are passed via GPRs only.
>>> +    ArgRegsSaveSize = ArgRegsSize;
>>>  }
>>> So in if statement, instead of aligning ArgRegsSize of the split byval
>>> parameter, we align the total ArgRegsSize,
>>> Padding = (ArgRegsSize + AFI->getArgRegsSaveSize + Align - 1) &
>>> ~(Align-1) - (ArgRegsSize + AFI->getArgRegsSaveSize)
>>> ArgRegsSaveSize = ArgRegsSize + Padding.
>>> 
>>> For several 4 bytes byval params, since they are not split, the
>>> padding for each will be zero.
>>> 
>>>> May be correct ArgRegsSaveArea straight where "sub sp, sp, #N" is
>>>> emitted? It is ARMFrameLowering::emitPrologue, emitEpilogue, and the
>>>> same for Thumb1FrameLowering.
>>>> 
>>>> It may be done simple, by adding Align parameter to
>>>> getArgRegsSaveSize, setting it to 0 by default.
>>>> 
>>>> Then in emitPrologue/Epilogue we just replace current
>>>> getArgRegsSaveSize with getArgRegsSaveSize(Align).
>>> This may not work, for our example of a 4-byte byval followed by a
>>> 20-byte byval, computeRegArea for the second byval will return 16 as
>>> ArgRegsSaveSize, and the padding will be 4.
>>> AFI->getArgRegsSaveSize(8) will return the aligned result of 4+20 -->
>>> 24 bytes, which means the actual padding is 8 bytes.
>>> 
>>> With the above proposed implementation, computeRegArea for the second
>>> byval will return 12 as ArgRegsSaveSize, and the padding is 0.
>>> AFI->getArgRegsSaveSize() will return 4+12 bytes, which means the
>>> actual padding is 0 bytes.
>>> 
>>> Thanks,
>>> Manman
>>>> 
>>>> New patch is attached.
>>>> 
>>>> -Stepan.
>>>> 
>>>> Manman Ren wrote:
>>>>> 
>>>>> On May 14, 2013, at 7:30 AM, Stepan Dyatkovskiy wrote:
>>>>> 
>>>>>> Hello,
>>>>>> 
>>>>>> Evan wrote:
>>>>>>> The comment doesn't parse. "in when"?
>>>>>> Sorry :-)
>>>>>> 
>>>>>>> But I don't see check for stack alignment anywhere in the code.
>>>>>> In reworked patch I added it. Though, I moved main changes from
>>>>>> StoreByValRegs to computeRegArea, as Manman proposed.
>>>>>> 
>>>>>> Manman wrote:
>>>>>>> This will cause the callee to update SP by 20 bytes
>>>>>>> (ArgRegsSaveSize will be 4 + 4 + 4*3), which means SP will not be
>>>>>>> 8-byte aligned.
>>>>>> Yes, SP will be 4 byte aligned. Though inside the routing we use it as
>>>>>> very simplified heap, and..
>>>>>> Stack is double-word aligned (SP mod 8 = 0) for *public interfaces*
>>>>>> only, while for other cases it is word aligned (SP mod 4 = 0).
>>>>>> Then, perhaps, there is nothing criminal to use offsets like #12 or
>>>>>> #4 inside the routine? Though, of course, params with types like
>>>>>> i64, or "double" will be 8 bytes aligned everywhere.
>>>>>> 
>>>>>>> It is a little weird to have a hard-coded ArgSize 4 here for a
>>>>>>> variable argument.
>>>>>> Hm.. This parameter is rudimentary for VA functions... So,
>>>>>> hard-code here 0 then?
>>>>>> 
>>>>>> Just in case, a little more details...
>>>>>> "Byval" parameters are recovered with padding. This trick allows to
>>>>>> concatenate GPRs part with stack part. But, even before byval
>>>>>> patches (was posted earlier), there was an issue, that has been
>>>>>> registered as PR15868.
>>>>>> 
>>>>>> Consider next example:
>>>>>> %struct_t = type { i32, i32 }
>>>>>> 
>>>>>> ; Note p3 and p4 are MemLocs.
>>>>>> define void @check227(
>>>>>>  i32 %p0,             ; R0
>>>>>>  i32 %p1,             ; R1
>>>>>>  i32 %p2,             ; R2
>>>>>>  %struct_t* byval p3, ; R3, [SP+0, SP+4), LocMemOffset = 0
>>>>>>  %struct_t* p4        ; SP+4,             LocMemOffset = 8
>>>>>> )
>>>>>> 
>>>>>> Currently LowerFormalArguments does the next:
>>>>>> 
>>>>>> p3: ArgRegsSize = 4, ArgRegsSaveSize = 8, parameter is split
>>>>>> between GPRs and stack; stack part is associated with FrameIndex
>>>>>> with offset = 4.
>>>>>> p4: is pointer, its value is stored in stack, FrameIndex's offset =
>>>>>> 8 (Wrong, must be 12)
>>>>>> 
>>>>>> For p4 we should take into account: p3 is padded with 4 bytes, so
>>>>>> proper offset is 12, not 8.
>>>>>> 
>>>>>> Relative to case, when byval parameter small enough to be in GPRs
>>>>>> only.
>>>>>> We don't need to concatenate it with stack part. So, we don't need
>>>>>> to pad it. Inside the routine we can store it only using constraint
>>>>>> "SP mod 4 = 0". I can't see any reason to extend ArgRegsSaveArea in
>>>>>> this case.
>>>>>> I may be wrong in my last conclusion, so here it would be good to
>>>>>> see your confirmation/disagree.
>>>>> 
>>>>> Hi Stepen,
>>>>> 
>>>>> Yes, we do not need padding for byvals that are in GPRs only. I
>>>>> think the purpose of performing the alignment
>>>>> ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>>>> is to make sure the updated SP is 8-byte aligned. Correct me if I am
>>>>> wrong :]
>>>>> 
>>>>> In the prologue of check227, we will update SP by a multiple of 8
>>>>> bytes, if we call any function from check227, the SP will be 8-byte
>>>>> aligned for the callee.
>>>>> When we have a small byval of 4-byte followed by a large byval,
>>>>> since the patch will not use padding area for the first small byval,
>>>>> and will use 4-byte padding for the second byval,
>>>>> the ArgRegsSaveSize will be 4 (for first byval) + 4*4 (for second
>>>>> byval). For functions called from check227, SP will not be 8-byte
>>>>> aligned. Is this considered "public interfaces"?
>>>>> 
>>>>> I think the fix is to check AFI->getArgRegsSaveSize inside
>>>>> computeRegArea and add padding if (ArgRegsSize +
>>>>> AFI->getArgRegsSaveSize) is not correctly aligned.
>>>>> 
>>>>> Otherwise LGTM,
>>>>> 
>>>>> Thanks,
>>>>> Manman
>>>>>> 
>>>>>> 
>>>>>> I have attached reworked patch + two examples (.c and .ll). Just
>>>>>> try run .c file with -O2 -S, with clang and gcc, and compare
>>>>>> outputs (O2 allows to strip out some unnecessary instructions here).
>>>>>> 
>>>>>> Thanks for reviews!
>>>>>> 
>>>>>> -Stepan.
>>>>>> 
>>>>>> Manman Ren wrote:
>>>>>>> 
>>>>>>> On May 13, 2013, at 1:28 PM, Evan Cheng wrote:
>>>>>>> 
>>>>>>>> Hi Stepan,
>>>>>>>> 
>>>>>>>> Your explanation below seems to indicate this is only an issue
>>>>>>>> when stack alignment is 8 (or smaller?). But I don't see check
>>>>>>>> for stack alignment anywhere in the code.
>>>>>>>   unsigned Align =
>>>>>>> MF.getTarget().getFrameLowering()->getStackAlignment();
>>>>>>>   ArgRegsSize = NumGPRs * 4;
>>>>>>>   ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>>>>>> The ArgRegsSaveSize is different from ArgRegsSize when stack
>>>>>>> alignment is 8 or larger.
>>>>>>> And padding is added when they differ.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Manman
>>>>>>>> 
>>>>>>>> Other nitpicks:
>>>>>>>> +    // If parameter was splitted between stack and GPRs...
>>>>>>>> Typo: s/splitted/split
>>>>>>>> 
>>>>>>>> -    return MFI->CreateFixedObject(4, ArgOffset, !ForceMutable);
>>>>>>>> +    return MFI->CreateFixedObject(
>>>>>>>> +        4, AFI->getStoredByValParamsPadding() + ArgOffset,
>>>>>>>> !ForceMutable);
>>>>>>>> 
>>>>>>>> You should just use Padding in place of
>>>>>>>> AFI->getStoredByValParamsPadding() here.
>>>>>>>> 
>>>>>>>> +  /// StByValParamsPadding - in when we store params in memory
>>>>>>>> +  /// sometimes we need to insert gap before parameter start
>>>>>>>> address.
>>>>>>>> +  ///
>>>>>>>> 
>>>>>>>> The comment doesn't parse. "in when"?
>>>>>>>> 
>>>>>>>> Evan
>>>>>>>> 
>>>>>>>> On May 13, 2013, at 6:26 AM, Stepan Dyatkovskiy
>>>>>>>> <STPWORLD at narod.ru> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> In case when stack alignment is 8 and GPRs parameter part size
>>>>>>>>> is not N*8: we add padding to GPRs part, so part's last byte
>>>>>>>>> must be recovered at address K*8-1.
>>>>>>>>> We need to do it, since remained (stack) part of parameter
>>>>>>>>> starts from address K*8, and we need to "attach" "GPRs head"
>>>>>>>>> without gaps to it:
>>>>>>>>> Stack:
>>>>>>>>> |---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8
>>>>>>>>> bytes...
>>>>>>>>> [ [padding] [GPRs head] ] [ ------ Tail passed via stack  ------
>>>>>>>>> ...
>>>>>>>>> 
>>>>>>>>> Note, once we added padding we need to correct *all* Arg offsets
>>>>>>>>> that are going after padded one.
>>>>>>>>> That's why we need this fix: Arg offsets were never corrected
>>>>>>>>> before this patch. See new test-case included in patch.
>>>>>>>>> 
>>>>>>>>> We also don't need to insert padding for byval parameters that
>>>>>>>>> are stored in GPRs only. We need pad only last byval parameter
>>>>>>>>> and only in case it outsides GPRs and stack alignment = 8.
>>>>>>>>> 
>>>>>>>>> Please find the patch in attachment.
>>>>>>>>> 
>>>>>>>>> This patch reduces stack usage for some cases:
>>>>>>>>> We can reduce ArgRegsSaveArea since inner 4 bytes sized byval
>>>>>>>>> params my be "packed" with alignment 4.
>>>>>>>>> 
>>>>>>>>> This patch also fixes PR15868 release-blocking issue.
>>>>>>>>> 
>>>>>>>>> -Stepan.
>>>>>>>>> 
>>>>>>>>> <pr15868-2013-05-13-2.patch>_______________________________________________
>>>>>>>>> 
>>>>>>>>> llvm-commits mailing list
>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>> 
>>>>>> <attri_16_simple3.2.c><attri_16_simple3.2.ll><pr15868-2013-05-14.patch>
>>>> 
>>>> <pr15868-2013-05-15.patch>
> 
> <pr15868-2013-05-17.patch>



More information about the llvm-commits mailing list