ARM, PR15868 Release-blocker fix. ByVal parameters padding fix.

Thu May 16 12:36:18 PDT 2013

Hi Manman,

 >> Then in emitPrologue/Epilogue we just replace current 
getArgRegsSaveSize with getArgRegsSaveSize(Align).
 > This may not work, for our example of a 4-byte byval followed by a 
20-byte byval, computeRegArea for the second byval will return 16 as 
ArgRegsSaveSize, and the padding will be 4.

Ah, yes, exactly. Need one more test-case then :-)

 > So in if statement, instead of aligning ArgRegsSize of the split 
byval parameter, we align the total ArgRegsSize,
 > Padding = (ArgRegsSize + AFI->getArgRegsSaveSize + Align - 1) & 
~(Align-1) - (ArgRegsSize + AFI->getArgRegsSaveSize)
 > ArgRegsSaveSize = ArgRegsSize + Padding.
 >
 > For several 4 bytes byval params, since they are not split, the 
padding for each will be zero.

Then we can use this implementation, and add Align parameter to 
getArgRegsSaveSize. It solves problem you've mentioned for odd amount of 
4 bytes params.

-Stepan

Manman Ren wrote:
>
> On May 15, 2013, at 3:07 AM, Stepan Dyatkovskiy wrote:
>
>> Hi Manman,
>>
>>> For functions called from check227, SP will not be 8-byte aligned. Is this considered "public interfaces"?
>> Oh.. you right. If we have the only 4 bytes byval parameter, then in my case "sub sp, sp, #4" will be emitted. That's wrong.
>>
>>> I think the fix is to check AFI->getArgRegsSaveSize inside computeRegArea and add padding if (ArgRegsSize + AFI->getArgRegsSaveSize) is not correctly aligned.
>>
>> For several 4 bytes byval params we get several 8 bytes increments then. So for 3 x 4-byte-param we would get 24 bytes stack area.
> Hi Stepan,
>
> I meant to add padding only for the single byval parameter that is split, and the amount of padding depends on the total ArgRegsSize.
> Given in the patch:
> -  ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
> +
> +  // If parameter is split between stack and GPRs...
> +  if (NumGPRs && Align == 8 &&
> +      (ArgRegsSize < ArgSize ||
> +        InRegsParamRecordIdx >= CCInfo.getInRegsParamsCount())) {
> +    // Add padding for part of param recovered from GPRs, so
> +    // its last byte must be at address K*8 - 1.
> +    // We need to do it, since remained (stack) part of parameter has
> +    // stack alignment, and we need to "attach" "GPRs head" without gaps
> +    // to it:
> +    // Stack:
> +    // |---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes...
> +    // [ [padding] [GPRs head] ] [        Tail passed via stack       ....
> +    //
> +    ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
> +  } else
> +    // We don't need to extend regs save size for byval parameters if they
> +    // are passed via GPRs only.
> +    ArgRegsSaveSize = ArgRegsSize;
>   }
> So in if statement, instead of aligning ArgRegsSize of the split byval parameter, we align the total ArgRegsSize,
> Padding = (ArgRegsSize + AFI->getArgRegsSaveSize + Align - 1) & ~(Align-1) - (ArgRegsSize + AFI->getArgRegsSaveSize)
> ArgRegsSaveSize = ArgRegsSize + Padding.
>
> For several 4 bytes byval params, since they are not split, the padding for each will be zero.
>
>> May be correct ArgRegsSaveArea straight where "sub sp, sp, #N" is emitted? It is ARMFrameLowering::emitPrologue, emitEpilogue, and the same for Thumb1FrameLowering.
>>
>> It may be done simple, by adding Align parameter to getArgRegsSaveSize, setting it to 0 by default.
>>
>> Then in emitPrologue/Epilogue we just replace current getArgRegsSaveSize with getArgRegsSaveSize(Align).
> This may not work, for our example of a 4-byte byval followed by a 20-byte byval, computeRegArea for the second byval will return 16 as ArgRegsSaveSize, and the padding will be 4.
> AFI->getArgRegsSaveSize(8) will return the aligned result of 4+20 --> 24 bytes, which means the actual padding is 8 bytes.
>
> With the above proposed implementation, computeRegArea for the second byval will return 12 as ArgRegsSaveSize, and the padding is 0.
> AFI->getArgRegsSaveSize() will return 4+12 bytes, which means the actual padding is 0 bytes.
>
> Thanks,
> Manman
>>
>> New patch is attached.
>>
>> -Stepan.
>>
>> Manman Ren wrote:
>>>
>>> On May 14, 2013, at 7:30 AM, Stepan Dyatkovskiy wrote:
>>>
>>>> Hello,
>>>>
>>>> Evan wrote:
>>>>> The comment doesn't parse. "in when"?
>>>> Sorry :-)
>>>>
>>>>> But I don't see check for stack alignment anywhere in the code.
>>>> In reworked patch I added it. Though, I moved main changes from StoreByValRegs to computeRegArea, as Manman proposed.
>>>>
>>>> Manman wrote:
>>>>> This will cause the callee to update SP by 20 bytes (ArgRegsSaveSize will be 4 + 4 + 4*3), which means SP will not be 8-byte aligned.
>>>> Yes, SP will be 4 byte aligned. Though inside the routing we use it as
>>>> very simplified heap, and..
>>>> Stack is double-word aligned (SP mod 8 = 0) for *public interfaces* only, while for other cases it is word aligned (SP mod 4 = 0). Then, perhaps, there is nothing criminal to use offsets like #12 or #4 inside the routine? Though, of course, params with types like i64, or "double" will be 8 bytes aligned everywhere.
>>>>
>>>>> It is a little weird to have a hard-coded ArgSize 4 here for a variable argument.
>>>> Hm.. This parameter is rudimentary for VA functions... So, hard-code here 0 then?
>>>>
>>>> Just in case, a little more details...
>>>> "Byval" parameters are recovered with padding. This trick allows to concatenate GPRs part with stack part. But, even before byval patches (was posted earlier), there was an issue, that has been registered as PR15868.
>>>>
>>>> Consider next example:
>>>> %struct_t = type { i32, i32 }
>>>>
>>>> ; Note p3 and p4 are MemLocs.
>>>> define void @check227(
>>>>   i32 %p0,             ; R0
>>>>   i32 %p1,             ; R1
>>>>   i32 %p2,             ; R2
>>>>   %struct_t* byval p3, ; R3, [SP+0, SP+4), LocMemOffset = 0
>>>>   %struct_t* p4        ; SP+4,             LocMemOffset = 8
>>>> )
>>>>
>>>> Currently LowerFormalArguments does the next:
>>>>
>>>> p3: ArgRegsSize = 4, ArgRegsSaveSize = 8, parameter is split between GPRs and stack; stack part is associated with FrameIndex with offset = 4.
>>>> p4: is pointer, its value is stored in stack, FrameIndex's offset = 8 (Wrong, must be 12)
>>>>
>>>> For p4 we should take into account: p3 is padded with 4 bytes, so proper offset is 12, not 8.
>>>>
>>>> Relative to case, when byval parameter small enough to be in GPRs only.
>>>> We don't need to concatenate it with stack part. So, we don't need to pad it. Inside the routine we can store it only using constraint "SP mod 4 = 0". I can't see any reason to extend ArgRegsSaveArea in this case.
>>>> I may be wrong in my last conclusion, so here it would be good to see your confirmation/disagree.
>>>
>>> Hi Stepen,
>>>
>>> Yes, we do not need padding for byvals that are in GPRs only. I think the purpose of performing the alignment
>>> ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>> is to make sure the updated SP is 8-byte aligned. Correct me if I am wrong :]
>>>
>>> In the prologue of check227, we will update SP by a multiple of 8 bytes, if we call any function from check227, the SP will be 8-byte aligned for the callee.
>>> When we have a small byval of 4-byte followed by a large byval, since the patch will not use padding area for the first small byval, and will use 4-byte padding for the second byval,
>>> the ArgRegsSaveSize will be 4 (for first byval) + 4*4 (for second byval). For functions called from check227, SP will not be 8-byte aligned. Is this considered "public interfaces"?
>>>
>>> I think the fix is to check AFI->getArgRegsSaveSize inside computeRegArea and add padding if (ArgRegsSize + AFI->getArgRegsSaveSize) is not correctly aligned.
>>>
>>> Otherwise LGTM,
>>>
>>> Thanks,
>>> Manman
>>>>
>>>>
>>>> I have attached reworked patch + two examples (.c and .ll). Just try run .c file with -O2 -S, with clang and gcc, and compare outputs (O2 allows to strip out some unnecessary instructions here).
>>>>
>>>> Thanks for reviews!
>>>>
>>>> -Stepan.
>>>>
>>>> Manman Ren wrote:
>>>>>
>>>>> On May 13, 2013, at 1:28 PM, Evan Cheng wrote:
>>>>>
>>>>>> Hi Stepan,
>>>>>>
>>>>>> Your explanation below seems to indicate this is only an issue when stack alignment is 8 (or smaller?). But I don't see check for stack alignment anywhere in the code.
>>>>>    unsigned Align = MF.getTarget().getFrameLowering()->getStackAlignment();
>>>>>    ArgRegsSize = NumGPRs * 4;
>>>>>    ArgRegsSaveSize = (ArgRegsSize + Align - 1) & ~(Align - 1);
>>>>> The ArgRegsSaveSize is different from ArgRegsSize when stack alignment is 8 or larger.
>>>>> And padding is added when they differ.
>>>>>
>>>>> Thanks,
>>>>> Manman
>>>>>>
>>>>>> Other nitpicks:
>>>>>> +    // If parameter was splitted between stack and GPRs...
>>>>>> Typo: s/splitted/split
>>>>>>
>>>>>> -    return MFI->CreateFixedObject(4, ArgOffset, !ForceMutable);
>>>>>> +    return MFI->CreateFixedObject(
>>>>>> +        4, AFI->getStoredByValParamsPadding() + ArgOffset, !ForceMutable);
>>>>>>
>>>>>> You should just use Padding in place of AFI->getStoredByValParamsPadding() here.
>>>>>>
>>>>>> +  /// StByValParamsPadding - in when we store params in memory
>>>>>> +  /// sometimes we need to insert gap before parameter start address.
>>>>>> +  ///
>>>>>>
>>>>>> The comment doesn't parse. "in when"?
>>>>>>
>>>>>> Evan
>>>>>>
>>>>>> On May 13, 2013, at 6:26 AM, Stepan Dyatkovskiy <STPWORLD at narod.ru> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> In case when stack alignment is 8 and GPRs parameter part size is not N*8: we add padding to GPRs part, so part's last byte must be recovered at address K*8-1.
>>>>>>> We need to do it, since remained (stack) part of parameter starts from address K*8, and we need to "attach" "GPRs head" without gaps to it:
>>>>>>> Stack:
>>>>>>> |---- 8 bytes block ----| |---- 8 bytes block ----| |---- 8 bytes...
>>>>>>> [ [padding] [GPRs head] ] [ ------ Tail passed via stack  ------ ...
>>>>>>>
>>>>>>> Note, once we added padding we need to correct *all* Arg offsets that are going after padded one.
>>>>>>> That's why we need this fix: Arg offsets were never corrected before this patch. See new test-case included in patch.
>>>>>>>
>>>>>>> We also don't need to insert padding for byval parameters that are stored in GPRs only. We need pad only last byval parameter and only in case it outsides GPRs and stack alignment = 8.
>>>>>>>
>>>>>>> Please find the patch in attachment.
>>>>>>>
>>>>>>> This patch reduces stack usage for some cases:
>>>>>>> We can reduce ArgRegsSaveArea since inner 4 bytes sized byval params my be "packed" with alignment 4.
>>>>>>>
>>>>>>> This patch also fixes PR15868 release-blocking issue.
>>>>>>>
>>>>>>> -Stepan.
>>>>>>>
>>>>>>> <pr15868-2013-05-13-2.patch>_______________________________________________
>>>>>>> llvm-commits mailing list
>>>>>>> llvm-commits at cs.uiuc.edu
>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>>
>>>>>> _______________________________________________
>>>>>> llvm-commits mailing list
>>>>>> llvm-commits at cs.uiuc.edu
>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>>>>
>>>>
>>>> <attri_16_simple3.2.c><attri_16_simple3.2.ll><pr15868-2013-05-14.patch>
>>>
>>
>> <pr15868-2013-05-15.patch>
>