ARM, PR15868 Release-blocker fix. ByVal parameters padding fix.

Stepan Dyatkovskiy stpworld at narod.ru
Mon May 20 11:03:30 PDT 2013


Committed as r182237.
-Stepan
Manman Ren wrote:
> LGTM,
>
> Thanks,
> Manman
>
> On May 17, 2013, at 1:17 PM, Stepan Dyatkovskiy <stpworld at narod.ru> wrote:
>
>> Hi Manman,
>> I've attached new patch:
>> -- Pasted your implementation in "if" statement.
>> -- Kept Align parameter in getArgRegsSaveSize method.
>> -- New test-case ("see 2013-05-13-AAPCS-byval-padding2.ll") that checks
>> case, you mentioned in previous post.
>>
>> -Stepan.
>>
>> Stepan Dyatkovskiy wrote:
>>> Hi Manman,
>>>
>>>>> Then in emitPrologue/Epilogue we just replace current
>>> getArgRegsSaveSize with getArgRegsSaveSize(Align).
>>>> This may not work, for our example of a 4-byte byval followed by a
>>> 20-byte byval, computeRegArea for the second byval will return 16 as
>>> ArgRegsSaveSize, and the padding will be 4.
>>>
>>> Ah, yes, exactly. Need one more test-case then :-)
>>>
>>>> So in if statement, instead of aligning ArgRegsSize of the split
>>> byval parameter, we align the total ArgRegsSize,
>>>> Padding = (ArgRegsSize + AFI->getArgRegsSaveSize + Align - 1) &
>>> ~(Align-1) - (ArgRegsSize + AFI->getArgRegsSaveSize)
>>>> ArgRegsSaveSize = ArgRegsSize + Padding.
>>>>
>>>> For several 4 bytes byval params, since they are not split, the
>>> padding for each will be zero.
>>>
>>> Then we can use this implementation, and add Align parameter to
>>> getArgRegsSaveSize. It solves problem you've mentioned for odd amount of
>>> 4 bytes params.
>>>
>>> -Stepan
>>>
>>> Manman Ren wrote:
>>>>
>>>> On May 15, 2013, at 3:07 AM, Stepan Dyatkovskiy wrote:
>>>>
>>>>> Hi Manman,
>>>>>
>>>>>> For functions called from check227, SP will not be 8-byte aligned.
>>>>>> Is this considered "public interfaces"?
>>>>> Oh.. you right. If we have the only 4 bytes byval parameter, then in
>>>>> my case "sub sp, sp, #4" will be emitted. That's wrong.
>>>>>
>>>>>> I think the fix is to check AFI->getArgRegsSaveSize inside
>>>>>> computeRegArea and add padding if (ArgRegsSize +
>>>>>> AFI->getArgRegsSaveSize) is not correctly aligned.
>>>>>
>>>>> For several 4 bytes byval params we get several 8 bytes increments
>>>>> then. So for 3 x 4-byte-param we would get 24 bytes stack area.
>>>> Hi Stepan,
>>>>
>>>> I meant to add padding only for the single byval parameter that is
>>>> split, and the amount of padding depends on the total ArgRegsSize.
>>>> Given in the patch:
>>>> -  ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>>> +
>>>> +  // If parameter is split between stack and GPRs...
>>>> +  if (NumGPRs && Align == 8 &&
>>>> +      (ArgRegsSize < ArgSize ||
>>>> +        InRegsParamRecordIdx >= CCInfo.getInRegsParamsCount())) {
>>>> +    // Add padding for part of param recovered from GPRs, so
>>>> +    // its last byte must be at address K*8 - 1.
>>>> +    // We need to do it, since remained (stack) part of parameter has
>>>> +    // stack alignment, and we need to "attach" "GPRs head" without gaps
>>>> +    // to it:
>>>> +    // Stack:
>>>> +    // |---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8
>>>> bytes...
>>>> +    // [ [padding] [GPRs head] ] [        Tail passed via stack
>>>> ....
>>>> +    //
>>>> +    ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>>> +  } else
>>>> +    // We don't need to extend regs save size for byval parameters if
>>>> they
>>>> +    // are passed via GPRs only.
>>>> +    ArgRegsSaveSize = ArgRegsSize;
>>>>   }
>>>> So in if statement, instead of aligning ArgRegsSize of the split byval
>>>> parameter, we align the total ArgRegsSize,
>>>> Padding = (ArgRegsSize + AFI->getArgRegsSaveSize + Align - 1) &
>>>> ~(Align-1) - (ArgRegsSize + AFI->getArgRegsSaveSize)
>>>> ArgRegsSaveSize = ArgRegsSize + Padding.
>>>>
>>>> For several 4 bytes byval params, since they are not split, the
>>>> padding for each will be zero.
>>>>
>>>>> May be correct ArgRegsSaveArea straight where "sub sp, sp, #N" is
>>>>> emitted? It is ARMFrameLowering::emitPrologue, emitEpilogue, and the
>>>>> same for Thumb1FrameLowering.
>>>>>
>>>>> It may be done simple, by adding Align parameter to
>>>>> getArgRegsSaveSize, setting it to 0 by default.
>>>>>
>>>>> Then in emitPrologue/Epilogue we just replace current
>>>>> getArgRegsSaveSize with getArgRegsSaveSize(Align).
>>>> This may not work, for our example of a 4-byte byval followed by a
>>>> 20-byte byval, computeRegArea for the second byval will return 16 as
>>>> ArgRegsSaveSize, and the padding will be 4.
>>>> AFI->getArgRegsSaveSize(8) will return the aligned result of 4+20 -->
>>>> 24 bytes, which means the actual padding is 8 bytes.
>>>>
>>>> With the above proposed implementation, computeRegArea for the second
>>>> byval will return 12 as ArgRegsSaveSize, and the padding is 0.
>>>> AFI->getArgRegsSaveSize() will return 4+12 bytes, which means the
>>>> actual padding is 0 bytes.
>>>>
>>>> Thanks,
>>>> Manman
>>>>>
>>>>> New patch is attached.
>>>>>
>>>>> -Stepan.
>>>>>
>>>>> Manman Ren wrote:
>>>>>>
>>>>>> On May 14, 2013, at 7:30 AM, Stepan Dyatkovskiy wrote:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> Evan wrote:
>>>>>>>> The comment doesn't parse. "in when"?
>>>>>>> Sorry :-)
>>>>>>>
>>>>>>>> But I don't see check for stack alignment anywhere in the code.
>>>>>>> In reworked patch I added it. Though, I moved main changes from
>>>>>>> StoreByValRegs to computeRegArea, as Manman proposed.
>>>>>>>
>>>>>>> Manman wrote:
>>>>>>>> This will cause the callee to update SP by 20 bytes
>>>>>>>> (ArgRegsSaveSize will be 4 + 4 + 4*3), which means SP will not be
>>>>>>>> 8-byte aligned.
>>>>>>> Yes, SP will be 4 byte aligned. Though inside the routing we use it as
>>>>>>> very simplified heap, and..
>>>>>>> Stack is double-word aligned (SP mod 8 = 0) for *public interfaces*
>>>>>>> only, while for other cases it is word aligned (SP mod 4 = 0).
>>>>>>> Then, perhaps, there is nothing criminal to use offsets like #12 or
>>>>>>> #4 inside the routine? Though, of course, params with types like
>>>>>>> i64, or "double" will be 8 bytes aligned everywhere.
>>>>>>>
>>>>>>>> It is a little weird to have a hard-coded ArgSize 4 here for a
>>>>>>>> variable argument.
>>>>>>> Hm.. This parameter is rudimentary for VA functions... So,
>>>>>>> hard-code here 0 then?
>>>>>>>
>>>>>>> Just in case, a little more details...
>>>>>>> "Byval" parameters are recovered with padding. This trick allows to
>>>>>>> concatenate GPRs part with stack part. But, even before byval
>>>>>>> patches (was posted earlier), there was an issue, that has been
>>>>>>> registered as PR15868.
>>>>>>>
>>>>>>> Consider next example:
>>>>>>> %struct_t = type { i32, i32 }
>>>>>>>
>>>>>>> ; Note p3 and p4 are MemLocs.
>>>>>>> define void @check227(
>>>>>>>   i32 %p0,             ; R0
>>>>>>>   i32 %p1,             ; R1
>>>>>>>   i32 %p2,             ; R2
>>>>>>>   %struct_t* byval p3, ; R3, [SP+0, SP+4), LocMemOffset = 0
>>>>>>>   %struct_t* p4        ; SP+4,             LocMemOffset = 8
>>>>>>> )
>>>>>>>
>>>>>>> Currently LowerFormalArguments does the next:
>>>>>>>
>>>>>>> p3: ArgRegsSize = 4, ArgRegsSaveSize = 8, parameter is split
>>>>>>> between GPRs and stack; stack part is associated with FrameIndex
>>>>>>> with offset = 4.
>>>>>>> p4: is pointer, its value is stored in stack, FrameIndex's offset =
>>>>>>> 8 (Wrong, must be 12)
>>>>>>>
>>>>>>> For p4 we should take into account: p3 is padded with 4 bytes, so
>>>>>>> proper offset is 12, not 8.
>>>>>>>
>>>>>>> Relative to case, when byval parameter small enough to be in GPRs
>>>>>>> only.
>>>>>>> We don't need to concatenate it with stack part. So, we don't need
>>>>>>> to pad it. Inside the routine we can store it only using constraint
>>>>>>> "SP mod 4 = 0". I can't see any reason to extend ArgRegsSaveArea in
>>>>>>> this case.
>>>>>>> I may be wrong in my last conclusion, so here it would be good to
>>>>>>> see your confirmation/disagree.
>>>>>>
>>>>>> Hi Stepen,
>>>>>>
>>>>>> Yes, we do not need padding for byvals that are in GPRs only. I
>>>>>> think the purpose of performing the alignment
>>>>>> ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>>>>> is to make sure the updated SP is 8-byte aligned. Correct me if I am
>>>>>> wrong :]
>>>>>>
>>>>>> In the prologue of check227, we will update SP by a multiple of 8
>>>>>> bytes, if we call any function from check227, the SP will be 8-byte
>>>>>> aligned for the callee.
>>>>>> When we have a small byval of 4-byte followed by a large byval,
>>>>>> since the patch will not use padding area for the first small byval,
>>>>>> and will use 4-byte padding for the second byval,
>>>>>> the ArgRegsSaveSize will be 4 (for first byval) + 4*4 (for second
>>>>>> byval). For functions called from check227, SP will not be 8-byte
>>>>>> aligned. Is this considered "public interfaces"?
>>>>>>
>>>>>> I think the fix is to check AFI->getArgRegsSaveSize inside
>>>>>> computeRegArea and add padding if (ArgRegsSize +
>>>>>> AFI->getArgRegsSaveSize) is not correctly aligned.
>>>>>>
>>>>>> Otherwise LGTM,
>>>>>>
>>>>>> Thanks,
>>>>>> Manman
>>>>>>>
>>>>>>>
>>>>>>> I have attached reworked patch + two examples (.c and .ll). Just
>>>>>>> try run .c file with -O2 -S, with clang and gcc, and compare
>>>>>>> outputs (O2 allows to strip out some unnecessary instructions here).
>>>>>>>
>>>>>>> Thanks for reviews!
>>>>>>>
>>>>>>> -Stepan.
>>>>>>>
>>>>>>> Manman Ren wrote:
>>>>>>>>
>>>>>>>> On May 13, 2013, at 1:28 PM, Evan Cheng wrote:
>>>>>>>>
>>>>>>>>> Hi Stepan,
>>>>>>>>>
>>>>>>>>> Your explanation below seems to indicate this is only an issue
>>>>>>>>> when stack alignment is 8 (or smaller?). But I don't see check
>>>>>>>>> for stack alignment anywhere in the code.
>>>>>>>>    unsigned Align =
>>>>>>>> MF.getTarget().getFrameLowering()->getStackAlignment();
>>>>>>>>    ArgRegsSize = NumGPRs * 4;
>>>>>>>>    ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>>>>>>> The ArgRegsSaveSize is different from ArgRegsSize when stack
>>>>>>>> alignment is 8 or larger.
>>>>>>>> And padding is added when they differ.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Manman
>>>>>>>>>
>>>>>>>>> Other nitpicks:
>>>>>>>>> +    // If parameter was splitted between stack and GPRs...
>>>>>>>>> Typo: s/splitted/split
>>>>>>>>>
>>>>>>>>> -    return MFI->CreateFixedObject(4, ArgOffset, !ForceMutable);
>>>>>>>>> +    return MFI->CreateFixedObject(
>>>>>>>>> +        4, AFI->getStoredByValParamsPadding() + ArgOffset,
>>>>>>>>> !ForceMutable);
>>>>>>>>>
>>>>>>>>> You should just use Padding in place of
>>>>>>>>> AFI->getStoredByValParamsPadding() here.
>>>>>>>>>
>>>>>>>>> +  /// StByValParamsPadding - in when we store params in memory
>>>>>>>>> +  /// sometimes we need to insert gap before parameter start
>>>>>>>>> address.
>>>>>>>>> +  ///
>>>>>>>>>
>>>>>>>>> The comment doesn't parse. "in when"?
>>>>>>>>>
>>>>>>>>> Evan
>>>>>>>>>
>>>>>>>>> On May 13, 2013, at 6:26 AM, Stepan Dyatkovskiy
>>>>>>>>> <STPWORLD at narod.ru> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> In case when stack alignment is 8 and GPRs parameter part size
>>>>>>>>>> is not N*8: we add padding to GPRs part, so part's last byte
>>>>>>>>>> must be recovered at address K*8-1.
>>>>>>>>>> We need to do it, since remained (stack) part of parameter
>>>>>>>>>> starts from address K*8, and we need to "attach" "GPRs head"
>>>>>>>>>> without gaps to it:
>>>>>>>>>> Stack:
>>>>>>>>>> |---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8
>>>>>>>>>> bytes...
>>>>>>>>>> [ [padding] [GPRs head] ] [ ------ Tail passed via stack  ------
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> Note, once we added padding we need to correct *all* Arg offsets
>>>>>>>>>> that are going after padded one.
>>>>>>>>>> That's why we need this fix: Arg offsets were never corrected
>>>>>>>>>> before this patch. See new test-case included in patch.
>>>>>>>>>>
>>>>>>>>>> We also don't need to insert padding for byval parameters that
>>>>>>>>>> are stored in GPRs only. We need pad only last byval parameter
>>>>>>>>>> and only in case it outsides GPRs and stack alignment = 8.
>>>>>>>>>>
>>>>>>>>>> Please find the patch in attachment.
>>>>>>>>>>
>>>>>>>>>> This patch reduces stack usage for some cases:
>>>>>>>>>> We can reduce ArgRegsSaveArea since inner 4 bytes sized byval
>>>>>>>>>> params my be "packed" with alignment 4.
>>>>>>>>>>
>>>>>>>>>> This patch also fixes PR15868 release-blocking issue.
>>>>>>>>>>
>>>>>>>>>> -Stepan.
>>>>>>>>>>
>>>>>>>>>> <pr15868-2013-05-13-2.patch>_______________________________________________
>>>>>>>>>>
>>>>>>>>>> llvm-commits mailing list
>>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> llvm-commits mailing list
>>>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>>
>>>>>>> <attri_16_simple3.2.c><attri_16_simple3.2.ll><pr15868-2013-05-14.patch>
>>>>>
>>>>> <pr15868-2013-05-15.patch>
>>
>> <pr15868-2013-05-17.patch>




More information about the llvm-commits mailing list