ARM, PR15868 Release-blocker fix. ByVal parameters padding fix.

Stepan Dyatkovskiy stpworld at narod.ru
Fri May 17 13:17:15 PDT 2013


Hi Manman,
I've attached new patch:
  -- Pasted your implementation in "if" statement.
  -- Kept Align parameter in getArgRegsSaveSize method.
  -- New test-case ("see 2013-05-13-AAPCS-byval-padding2.ll") that checks
case, you mentioned in previous post.

-Stepan.

Stepan Dyatkovskiy wrote:
> Hi Manman,
>
>  >> Then in emitPrologue/Epilogue we just replace current
> getArgRegsSaveSize with getArgRegsSaveSize(Align).
>  > This may not work, for our example of a 4-byte byval followed by a
> 20-byte byval, computeRegArea for the second byval will return 16 as
> ArgRegsSaveSize, and the padding will be 4.
>
> Ah, yes, exactly. Need one more test-case then :-)
>
>  > So in if statement, instead of aligning ArgRegsSize of the split
> byval parameter, we align the total ArgRegsSize,
>  > Padding = (ArgRegsSize + AFI->getArgRegsSaveSize + Align - 1) &
> ~(Align-1) - (ArgRegsSize + AFI->getArgRegsSaveSize)
>  > ArgRegsSaveSize = ArgRegsSize + Padding.
>  >
>  > For several 4 bytes byval params, since they are not split, the
> padding for each will be zero.
>
> Then we can use this implementation, and add Align parameter to
> getArgRegsSaveSize. It solves problem you've mentioned for odd amount of
> 4 bytes params.
>
> -Stepan
>
> Manman Ren wrote:
>>
>> On May 15, 2013, at 3:07 AM, Stepan Dyatkovskiy wrote:
>>
>>> Hi Manman,
>>>
>>>> For functions called from check227, SP will not be 8-byte aligned.
>>>> Is this considered "public interfaces"?
>>> Oh.. you right. If we have the only 4 bytes byval parameter, then in
>>> my case "sub sp, sp, #4" will be emitted. That's wrong.
>>>
>>>> I think the fix is to check AFI->getArgRegsSaveSize inside
>>>> computeRegArea and add padding if (ArgRegsSize +
>>>> AFI->getArgRegsSaveSize) is not correctly aligned.
>>>
>>> For several 4 bytes byval params we get several 8 bytes increments
>>> then. So for 3 x 4-byte-param we would get 24 bytes stack area.
>> Hi Stepan,
>>
>> I meant to add padding only for the single byval parameter that is
>> split, and the amount of padding depends on the total ArgRegsSize.
>> Given in the patch:
>> -  ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>> +
>> +  // If parameter is split between stack and GPRs...
>> +  if (NumGPRs && Align == 8 &&
>> +      (ArgRegsSize < ArgSize ||
>> +        InRegsParamRecordIdx >= CCInfo.getInRegsParamsCount())) {
>> +    // Add padding for part of param recovered from GPRs, so
>> +    // its last byte must be at address K*8 - 1.
>> +    // We need to do it, since remained (stack) part of parameter has
>> +    // stack alignment, and we need to "attach" "GPRs head" without gaps
>> +    // to it:
>> +    // Stack:
>> +    // |---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8
>> bytes...
>> +    // [ [padding] [GPRs head] ] [        Tail passed via stack
>> ....
>> +    //
>> +    ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>> +  } else
>> +    // We don't need to extend regs save size for byval parameters if
>> they
>> +    // are passed via GPRs only.
>> +    ArgRegsSaveSize = ArgRegsSize;
>>   }
>> So in if statement, instead of aligning ArgRegsSize of the split byval
>> parameter, we align the total ArgRegsSize,
>> Padding = (ArgRegsSize + AFI->getArgRegsSaveSize + Align - 1) &
>> ~(Align-1) - (ArgRegsSize + AFI->getArgRegsSaveSize)
>> ArgRegsSaveSize = ArgRegsSize + Padding.
>>
>> For several 4 bytes byval params, since they are not split, the
>> padding for each will be zero.
>>
>>> May be correct ArgRegsSaveArea straight where "sub sp, sp, #N" is
>>> emitted? It is ARMFrameLowering::emitPrologue, emitEpilogue, and the
>>> same for Thumb1FrameLowering.
>>>
>>> It may be done simple, by adding Align parameter to
>>> getArgRegsSaveSize, setting it to 0 by default.
>>>
>>> Then in emitPrologue/Epilogue we just replace current
>>> getArgRegsSaveSize with getArgRegsSaveSize(Align).
>> This may not work, for our example of a 4-byte byval followed by a
>> 20-byte byval, computeRegArea for the second byval will return 16 as
>> ArgRegsSaveSize, and the padding will be 4.
>> AFI->getArgRegsSaveSize(8) will return the aligned result of 4+20 -->
>> 24 bytes, which means the actual padding is 8 bytes.
>>
>> With the above proposed implementation, computeRegArea for the second
>> byval will return 12 as ArgRegsSaveSize, and the padding is 0.
>> AFI->getArgRegsSaveSize() will return 4+12 bytes, which means the
>> actual padding is 0 bytes.
>>
>> Thanks,
>> Manman
>>>
>>> New patch is attached.
>>>
>>> -Stepan.
>>>
>>> Manman Ren wrote:
>>>>
>>>> On May 14, 2013, at 7:30 AM, Stepan Dyatkovskiy wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> Evan wrote:
>>>>>> The comment doesn't parse. "in when"?
>>>>> Sorry :-)
>>>>>
>>>>>> But I don't see check for stack alignment anywhere in the code.
>>>>> In reworked patch I added it. Though, I moved main changes from
>>>>> StoreByValRegs to computeRegArea, as Manman proposed.
>>>>>
>>>>> Manman wrote:
>>>>>> This will cause the callee to update SP by 20 bytes
>>>>>> (ArgRegsSaveSize will be 4 + 4 + 4*3), which means SP will not be
>>>>>> 8-byte aligned.
>>>>> Yes, SP will be 4 byte aligned. Though inside the routing we use it as
>>>>> very simplified heap, and..
>>>>> Stack is double-word aligned (SP mod 8 = 0) for *public interfaces*
>>>>> only, while for other cases it is word aligned (SP mod 4 = 0).
>>>>> Then, perhaps, there is nothing criminal to use offsets like #12 or
>>>>> #4 inside the routine? Though, of course, params with types like
>>>>> i64, or "double" will be 8 bytes aligned everywhere.
>>>>>
>>>>>> It is a little weird to have a hard-coded ArgSize 4 here for a
>>>>>> variable argument.
>>>>> Hm.. This parameter is rudimentary for VA functions... So,
>>>>> hard-code here 0 then?
>>>>>
>>>>> Just in case, a little more details...
>>>>> "Byval" parameters are recovered with padding. This trick allows to
>>>>> concatenate GPRs part with stack part. But, even before byval
>>>>> patches (was posted earlier), there was an issue, that has been
>>>>> registered as PR15868.
>>>>>
>>>>> Consider next example:
>>>>> %struct_t = type { i32, i32 }
>>>>>
>>>>> ; Note p3 and p4 are MemLocs.
>>>>> define void @check227(
>>>>>   i32 %p0,             ; R0
>>>>>   i32 %p1,             ; R1
>>>>>   i32 %p2,             ; R2
>>>>>   %struct_t* byval p3, ; R3, [SP+0, SP+4), LocMemOffset = 0
>>>>>   %struct_t* p4        ; SP+4,             LocMemOffset = 8
>>>>> )
>>>>>
>>>>> Currently LowerFormalArguments does the next:
>>>>>
>>>>> p3: ArgRegsSize = 4, ArgRegsSaveSize = 8, parameter is split
>>>>> between GPRs and stack; stack part is associated with FrameIndex
>>>>> with offset = 4.
>>>>> p4: is pointer, its value is stored in stack, FrameIndex's offset =
>>>>> 8 (Wrong, must be 12)
>>>>>
>>>>> For p4 we should take into account: p3 is padded with 4 bytes, so
>>>>> proper offset is 12, not 8.
>>>>>
>>>>> Relative to case, when byval parameter small enough to be in GPRs
>>>>> only.
>>>>> We don't need to concatenate it with stack part. So, we don't need
>>>>> to pad it. Inside the routine we can store it only using constraint
>>>>> "SP mod 4 = 0". I can't see any reason to extend ArgRegsSaveArea in
>>>>> this case.
>>>>> I may be wrong in my last conclusion, so here it would be good to
>>>>> see your confirmation/disagree.
>>>>
>>>> Hi Stepen,
>>>>
>>>> Yes, we do not need padding for byvals that are in GPRs only. I
>>>> think the purpose of performing the alignment
>>>> ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>>> is to make sure the updated SP is 8-byte aligned. Correct me if I am
>>>> wrong :]
>>>>
>>>> In the prologue of check227, we will update SP by a multiple of 8
>>>> bytes, if we call any function from check227, the SP will be 8-byte
>>>> aligned for the callee.
>>>> When we have a small byval of 4-byte followed by a large byval,
>>>> since the patch will not use padding area for the first small byval,
>>>> and will use 4-byte padding for the second byval,
>>>> the ArgRegsSaveSize will be 4 (for first byval) + 4*4 (for second
>>>> byval). For functions called from check227, SP will not be 8-byte
>>>> aligned. Is this considered "public interfaces"?
>>>>
>>>> I think the fix is to check AFI->getArgRegsSaveSize inside
>>>> computeRegArea and add padding if (ArgRegsSize +
>>>> AFI->getArgRegsSaveSize) is not correctly aligned.
>>>>
>>>> Otherwise LGTM,
>>>>
>>>> Thanks,
>>>> Manman
>>>>>
>>>>>
>>>>> I have attached reworked patch + two examples (.c and .ll). Just
>>>>> try run .c file with -O2 -S, with clang and gcc, and compare
>>>>> outputs (O2 allows to strip out some unnecessary instructions here).
>>>>>
>>>>> Thanks for reviews!
>>>>>
>>>>> -Stepan.
>>>>>
>>>>> Manman Ren wrote:
>>>>>>
>>>>>> On May 13, 2013, at 1:28 PM, Evan Cheng wrote:
>>>>>>
>>>>>>> Hi Stepan,
>>>>>>>
>>>>>>> Your explanation below seems to indicate this is only an issue
>>>>>>> when stack alignment is 8 (or smaller?). But I don't see check
>>>>>>> for stack alignment anywhere in the code.
>>>>>>    unsigned Align =
>>>>>> MF.getTarget().getFrameLowering()->getStackAlignment();
>>>>>>    ArgRegsSize = NumGPRs * 4;
>>>>>>    ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>>>>> The ArgRegsSaveSize is different from ArgRegsSize when stack
>>>>>> alignment is 8 or larger.
>>>>>> And padding is added when they differ.
>>>>>>
>>>>>> Thanks,
>>>>>> Manman
>>>>>>>
>>>>>>> Other nitpicks:
>>>>>>> +    // If parameter was splitted between stack and GPRs...
>>>>>>> Typo: s/splitted/split
>>>>>>>
>>>>>>> -    return MFI->CreateFixedObject(4, ArgOffset, !ForceMutable);
>>>>>>> +    return MFI->CreateFixedObject(
>>>>>>> +        4, AFI->getStoredByValParamsPadding() + ArgOffset,
>>>>>>> !ForceMutable);
>>>>>>>
>>>>>>> You should just use Padding in place of
>>>>>>> AFI->getStoredByValParamsPadding() here.
>>>>>>>
>>>>>>> +  /// StByValParamsPadding - in when we store params in memory
>>>>>>> +  /// sometimes we need to insert gap before parameter start
>>>>>>> address.
>>>>>>> +  ///
>>>>>>>
>>>>>>> The comment doesn't parse. "in when"?
>>>>>>>
>>>>>>> Evan
>>>>>>>
>>>>>>> On May 13, 2013, at 6:26 AM, Stepan Dyatkovskiy
>>>>>>> <STPWORLD at narod.ru> wrote:
>>>>>>>
>>>>>>>> Hi all,
>>>>>>>>
>>>>>>>> In case when stack alignment is 8 and GPRs parameter part size
>>>>>>>> is not N*8: we add padding to GPRs part, so part's last byte
>>>>>>>> must be recovered at address K*8-1.
>>>>>>>> We need to do it, since remained (stack) part of parameter
>>>>>>>> starts from address K*8, and we need to "attach" "GPRs head"
>>>>>>>> without gaps to it:
>>>>>>>> Stack:
>>>>>>>> |---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8
>>>>>>>> bytes...
>>>>>>>> [ [padding] [GPRs head] ] [ ------ Tail passed via stack  ------
>>>>>>>> ...
>>>>>>>>
>>>>>>>> Note, once we added padding we need to correct *all* Arg offsets
>>>>>>>> that are going after padded one.
>>>>>>>> That's why we need this fix: Arg offsets were never corrected
>>>>>>>> before this patch. See new test-case included in patch.
>>>>>>>>
>>>>>>>> We also don't need to insert padding for byval parameters that
>>>>>>>> are stored in GPRs only. We need pad only last byval parameter
>>>>>>>> and only in case it outsides GPRs and stack alignment = 8.
>>>>>>>>
>>>>>>>> Please find the patch in attachment.
>>>>>>>>
>>>>>>>> This patch reduces stack usage for some cases:
>>>>>>>> We can reduce ArgRegsSaveArea since inner 4 bytes sized byval
>>>>>>>> params my be "packed" with alignment 4.
>>>>>>>>
>>>>>>>> This patch also fixes PR15868 release-blocking issue.
>>>>>>>>
>>>>>>>> -Stepan.
>>>>>>>>
>>>>>>>> <pr15868-2013-05-13-2.patch>_______________________________________________
>>>>>>>>
>>>>>>>> llvm-commits mailing list
>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>
>>>>>
>>>>> <attri_16_simple3.2.c><attri_16_simple3.2.ll><pr15868-2013-05-14.patch>
>>>>>
>>>>
>>>
>>> <pr15868-2013-05-15.patch>
>>
>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: pr15868-2013-05-17.patch
Type: text/x-diff
Size: 13051 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130518/29e1143d/attachment.patch>


More information about the llvm-commits mailing list