[LLVMdev] ARM struct byval size > 64 triggers failure

Wed Jun 19 10:37:54 PDT 2013

I missed that the testing case is returning a struct.
You are right in VARegSaveSize.

For callee:
	sub	sp, sp, #16
	push	{r11, lr}
	mov	r11, sp
	sub	sp, sp, #8
	str	r3, [r11, #20]
	str	r2, [r11, #16]
	str	r1, [r11, #12]
	ldr	r1, [r11, #76]

The beginning of the input struct @ sp_at_entry - 16 - 8 + 12 = sp_at_entry -12
# of leftover bytes 67-12 = 55
r11+76 is @ sp_at_entry - 24 + 76 = sp_at_entry + 52, this is incorrect, it should be at align(55, 4) = 56.

For caller:
	mov	r0, sp
	ldr	r1, .LCPI1_0
	str	r1, [r0, #56]

the 2nd argument is at sp_at_entry + 56, which is correct.

On my setup (built from TOT), I got "ldr r1, [r11, #80]" instead of 76.

Thanks,
Manman

On Jun 18, 2013, at 11:31 PM, Rajesh Viswabramana wrote:

> Hi all,
>  
> Thanks for all comments,
> Filed bug report with details,
> http://llvm.org/bugs/show_bug.cgi?id=16368
>  
> @Manman, please find comments, queries inline.
>  
> Regards,
> Rajesh
>  
> ------- Original Message -------
> Sender : Stepan Dyatkovskiy<stpworld at narod.ru>
> Date : Jun 19, 2013 05:15 (GMT+09:00)
> Title : Re: [LLVMdev] ARM struct byval size > 64 triggers failure
>  
> Hi all,
> One more interesting job :-)
> I'll look too at this case tomorrow. Today my brain is about to be exploded..
>  
> -Stepan.
>  
> 18.06.2013, 22:56, "Manman Ren" <mren at apple.com>:
>>  
>> Hi Rajesh,
>>  
>> The callee code looks okay to me
>>> Assembly for check114 
>>> ---------------------------------------------------------------
>>>         sub     sp, sp, #16
>>>         push    {r11, lr}
>>>         mov     r11, sp
>>>         sub     sp, sp, #8
>>>         str     r3, [r11, #20]
>>>         str     r2, [r11, #16]
>>>         str     r1, [r11, #12]
>>>         ldr     r1, [r11, #76]
>> VARegSaveSize is 16 because we store the first 16 bytes of struct byval in r0 to r3.
>>  
>> For the test code/assembly I mentioned, only r1-r3 are used for struct byval. r0 used for return val.
>> So, NumGPRs = 3, VARegSize = 12, VARegSaveSize =16(4 byte offset), access of arg1 is going wrong.
>> For updated test code:
>> struct S114 check114 (int a, struct S114 arg0, struct S114* arg1) {
>> .....
>> }
>> Only r1, r2 used for struct byval, NumGPRs = 2, VARegSize = 8, VARegSaveSize =8, this case works, able to access arg1.
>>  
>> Please correct me, if my above understanding is wrong about NumGPRs, VARegSaveSize calculation.
>>  
>> Align in computeRegArea is 8 since ABI says the stack pointer needs to be 8 byte aligned at function entry point.
>> But the second argument does not have to be 8 byte aligned, in fact it is 4 byte aligned for i32.
>>  
>> Ok.
>>  
>> r11, #76 is equivalent to sp_at_entry + 52 since r11 = spat_entry - 16 - 8, which is 4-byte aligned after
>> storing the leftover (67-16=51) bytes of struct byval.
>>  
>> For test code, 3 reg used for struct byval, left over will be (67- 12 = 55) copied to stack by caller, 55 sets of ldrb,strb in assembly.
>>  
>> Pasting dump again for locating arg1 from sp_at_entry,
>> @entry of check114
>>   sp             0xbefff808 0xbefff808 --> sp_at_entry
>>   At arg1 accessing instruction [0xbefff808 + 52] -> [0xbefff83c] -> 0x4071706f 
>>   0xbefff7e4: 0x4001ed08 0x40024f90 0x4071706f 0xbefff8a0
>>   0xbefff7f4: 0x0000869c 0x00000000 0x3231302f 0x36353433
>>   0xbefff804: 0x3a393837 0x3e3d3c3b 0x4241403f 0x46454443
>>   0xbefff814: 0x4a494847 0x4e4d4c4b 0x5251504f 0x56555453
>>   0xbefff824: 0x5a595857 0x5e5d5c5b 0x6261605f 0x66656463
>>   0xbefff834: 0x6a696867 0x6e6d6c6b 0x4071706f 0x00010861                 
>>  
>> Can you also paste the assembly for the caller side and check whether the second argument is stored
>> at sp_at_entry+52?
>>  
>> Please find the attached assembly file.
>>  
>> As Renato suggested, please file a bug report.
>>  
>> Filed bug.
>>  
>> Thanks,
>> Manman
>>  
>>  
>> On Jun 18, 2013, at 4:26 AM, Rajesh Viswabramana <rajesh.vis at samsung.com> wrote:
>> 
>>> Hi,
>>>  
>>> Handling of pass by val of struct size >64 bytes case is seems wrong for arm targets.
>>>  
>>> Summary:
>>> Incase of struct pass by value for size > 64 along with other function params, failure seen in some corner cases. Access to function params result in wrong stack location access. 
>>> Stack pointer adjustment done by prologue emitter and offset used to access function params have different logics for calculaton.
>>>  
>>> Test code
>>> ---------------------------------------------------------------
>>> #include <stdio.h>
>>> struct S114 {
>>>   char a[67];
>>> }a114[5];
>>> struct S114 check114 (struct S114 arg0, struct S114* arg1) { 
>>>   if(&a114[0] != arg1)                         // arg1 value is wrong
>>>     printf( "values %p, %p\n", &a114[0], arg1);
>>> }
>>> int main () {
>>>   int i= 0, j = 0;
>>>   for (;j<2; j++)                                    // just filling a114 with some values for identification
>>>     for(i=0; i<sizeof(struct S114); i++)
>>>       memset(&a114[j].a[i],(0x11+i+j*30), sizeof(int)); 
>>>   check114 (a114[1], &a114[0]);         //=> a114[0]  is accessed from wrong location inside check114 function
>>> }
>>> ---------------------------------------------------------------
>>> clang -v
>>> clang version 3.3 (tags/RELEASE_33/final)
>>> Target: i386-pc-linux-gnu
>>> Thread model: posix
>>>  
>>> Output on arm :
>>> # ./check114.exe 
>>> values 0x10861, 0x4071706f
>>> which is wrong.
>>>  
>>> Assembly for check114 
>>> ---------------------------------------------------------------
>>>         sub     sp, sp, #16
>>>         push    {r11, lr}
>>>         mov     r11, sp
>>>         sub     sp, sp, #8
>>>         str     r3, [r11, #20]
>>>         str     r2, [r11, #16]
>>>         str     r1, [r11, #12]
>>>         ldr     r1, [r11, #76]
>>>         str     r1, [sp, #4]
>>>         .loc    1 7 0 prologue_end
>>>         ldr     r2, .LCPI0_0
>>>         cmp     r2, r1
>>>         beq     .LBB0_2
>>>         b       .LBB0_1
>>> ---------------------------------------------------------------
>>>  
>>> From reg, stack dump:
>>> ------------------------------------------------------------------------------------------------------------------------------
>>> @entry of check114
>>>   => 0x8398 <check114>: sub sp, sp, #16
>>>         0x839c <check114+4>: push {r11, lr}
>>>   sp             0xbefff808 0xbefff808
>>> @if condition
>>>        0x83b4 <check114+28>: ldr r1, [r11, #76] ; 0x4c         <--- wrong value copied to r1, offset #76 should be #80
>>>        0x83b8 <check114+32>: str r1, [sp, #4]
>>>        0x83bc <check114+36>: ldr r2, [pc, #44] ; 0x83f0 <check114+88>
>>>  => 0x83c0 <check114+40>: cmp r2, r1
>>>   
>>>   r11            0xbefff7f0 -1090521104
>>>   sp             0xbefff7e8 0xbefff7e8
>>>   Stack dump:
>>>   0xbefff7e4: 0x4001ed08 0x40024f90 0x4071706f 0xbefff8a0
>>>   0xbefff7f4: 0x0000869c 0x00000000 0x3231302f 0x36353433
>>>   0xbefff804: 0x3a393837 0x3e3d3c3b 0x4241403f 0x46454443
>>>   0xbefff814: 0x4a494847 0x4e4d4c4b 0x5251504f 0x56555453
>>>   0xbefff824: 0x5a595857 0x5e5d5c5b 0x6261605f 0x66656463
>>>   0xbefff834: 0x6a696867 0x6e6d6c6b 0x4071706f 0x00010861                 //[R11+4c] -> [0xbefff7f0+4c] -> [0xbefff83c] -> 0x4071706f
>>> Correct value is at location {[R11+4c]+4} --> 0x00010861, 4 bytes offset going wrong.
>>> ------------------------------------------------------------------------------------------------------------------------------
>>> When i checked from the ARM Lowering part for generation of
>>>   sub sp, sp, #16
>>> Emitted by,
>>>   if (VARegSaveSize)
>>>     emitSPUpdate(isARM, MBB, MBBI, dl, TII, -VARegSaveSize,        // --> VARegSaveSize is calculated in computeRegArea
>>>                  MachineInstr::FrameSetup)
>>> 
>>> ARMTargetLowering::computeRegArea(..) {
>>>   ...
>>>   VARegSize = NumGPRs * 4;
>>>   VARegSaveSize = (VARegSize + Align - 1) & ~(Align - 1);                 // --> 8 byte alignment done here
>>> }
>>> Stack pointer decremented to NumGPRs*4 + alignment
>>> NumGPRs = 3 registers
>>> VARegSaveSize  = 16 (after considering 8 byte alignment )
>>> 
>>> When the offset(#76) for the instruction, "ldr r1, [r11, #76] ; 0x4c"  is calculated, 4 bytes alignment is considered.
>>> In prologue stackpointer calculation 8 byte alignment is considered.
>>> Due to this mimatch of alignment, If try to access any parameter after byval which results wrong value.
>>>  
>>> Issue(or offset of 4 bytes) wont occur if even number of register used for byval spilling.
>>> ex: 
>>> struct S114 check114 (int a, struct S114 arg0, struct S114* arg1) { // accessing arg1 is fine in this case
>>> .....
>>> }
>>>  
>>> Could someone comment on below queries about fixing the problem,
>>> 1) Is this 8 byte alignment mandatory ?  Is this due to " ARM AAPCS 5.2.1.2 Stack constraints at a public interface" ? Can this be removed?
>>> 2) We will leave alignment as it is but in prologue we will adjust SP once again, this is little meaningless.
>>> 3) While accessing arg1 we will consider alignment and add extra offset -> looks better.
>>>  Offset to access arg1 is calculated by selection DAG that will be target independent. But Alignment adjustment should be done by target lowering. Any suggestions on how to fix this ?
>>>  
>>> Regards,
>>> Rajesh
>>>  
>>> <201306181656803_BEI0XT4N.gif>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> ,
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>  
>  
>  
>  
>  
>  
> <201306191201558_BEI0XT4N.gif>
> <check114.s>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/5449b9a6/attachment.html>