[LLVMdev] ARM struct byval size > 64 triggers failure
Manman Ren
mren at apple.com
Wed Jun 19 10:37:54 PDT 2013
I missed that the testing case is returning a struct.
You are right in VARegSaveSize.
For callee:
sub sp, sp, #16
push {r11, lr}
mov r11, sp
sub sp, sp, #8
str r3, [r11, #20]
str r2, [r11, #16]
str r1, [r11, #12]
ldr r1, [r11, #76]
The beginning of the input struct @ sp_at_entry - 16 - 8 + 12 = sp_at_entry -12
# of leftover bytes 67-12 = 55
r11+76 is @ sp_at_entry - 24 + 76 = sp_at_entry + 52, this is incorrect, it should be at align(55, 4) = 56.
For caller:
mov r0, sp
ldr r1, .LCPI1_0
str r1, [r0, #56]
the 2nd argument is at sp_at_entry + 56, which is correct.
On my setup (built from TOT), I got "ldr r1, [r11, #80]" instead of 76.
Thanks,
Manman
On Jun 18, 2013, at 11:31 PM, Rajesh Viswabramana wrote:
> Hi all,
>
> Thanks for all comments,
> Filed bug report with details,
> http://llvm.org/bugs/show_bug.cgi?id=16368
>
> @Manman, please find comments, queries inline.
>
> Regards,
> Rajesh
>
> ------- Original Message -------
> Sender : Stepan Dyatkovskiy<stpworld at narod.ru>
> Date : Jun 19, 2013 05:15 (GMT+09:00)
> Title : Re: [LLVMdev] ARM struct byval size > 64 triggers failure
>
> Hi all,
> One more interesting job :-)
> I'll look too at this case tomorrow. Today my brain is about to be exploded..
>
> -Stepan.
>
> 18.06.2013, 22:56, "Manman Ren" <mren at apple.com>:
>>
>> Hi Rajesh,
>>
>> The callee code looks okay to me
>>> Assembly for check114
>>> ---------------------------------------------------------------
>>> sub sp, sp, #16
>>> push {r11, lr}
>>> mov r11, sp
>>> sub sp, sp, #8
>>> str r3, [r11, #20]
>>> str r2, [r11, #16]
>>> str r1, [r11, #12]
>>> ldr r1, [r11, #76]
>> VARegSaveSize is 16 because we store the first 16 bytes of struct byval in r0 to r3.
>>
>> For the test code/assembly I mentioned, only r1-r3 are used for struct byval. r0 used for return val.
>> So, NumGPRs = 3, VARegSize = 12, VARegSaveSize =16(4 byte offset), access of arg1 is going wrong.
>> For updated test code:
>> struct S114 check114 (int a, struct S114 arg0, struct S114* arg1) {
>> .....
>> }
>> Only r1, r2 used for struct byval, NumGPRs = 2, VARegSize = 8, VARegSaveSize =8, this case works, able to access arg1.
>>
>> Please correct me, if my above understanding is wrong about NumGPRs, VARegSaveSize calculation.
>>
>> Align in computeRegArea is 8 since ABI says the stack pointer needs to be 8 byte aligned at function entry point.
>> But the second argument does not have to be 8 byte aligned, in fact it is 4 byte aligned for i32.
>>
>> Ok.
>>
>> r11, #76 is equivalent to sp_at_entry + 52 since r11 = spat_entry - 16 - 8, which is 4-byte aligned after
>> storing the leftover (67-16=51) bytes of struct byval.
>>
>> For test code, 3 reg used for struct byval, left over will be (67- 12 = 55) copied to stack by caller, 55 sets of ldrb,strb in assembly.
>>
>> Pasting dump again for locating arg1 from sp_at_entry,
>> @entry of check114
>> sp 0xbefff808 0xbefff808 --> sp_at_entry
>> At arg1 accessing instruction [0xbefff808 + 52] -> [0xbefff83c] -> 0x4071706f
>> 0xbefff7e4: 0x4001ed08 0x40024f90 0x4071706f 0xbefff8a0
>> 0xbefff7f4: 0x0000869c 0x00000000 0x3231302f 0x36353433
>> 0xbefff804: 0x3a393837 0x3e3d3c3b 0x4241403f 0x46454443
>> 0xbefff814: 0x4a494847 0x4e4d4c4b 0x5251504f 0x56555453
>> 0xbefff824: 0x5a595857 0x5e5d5c5b 0x6261605f 0x66656463
>> 0xbefff834: 0x6a696867 0x6e6d6c6b 0x4071706f 0x00010861
>>
>> Can you also paste the assembly for the caller side and check whether the second argument is stored
>> at sp_at_entry+52?
>>
>> Please find the attached assembly file.
>>
>> As Renato suggested, please file a bug report.
>>
>> Filed bug.
>>
>> Thanks,
>> Manman
>>
>>
>> On Jun 18, 2013, at 4:26 AM, Rajesh Viswabramana <rajesh.vis at samsung.com> wrote:
>>
>>> Hi,
>>>
>>> Handling of pass by val of struct size >64 bytes case is seems wrong for arm targets.
>>>
>>> Summary:
>>> Incase of struct pass by value for size > 64 along with other function params, failure seen in some corner cases. Access to function params result in wrong stack location access.
>>> Stack pointer adjustment done by prologue emitter and offset used to access function params have different logics for calculaton.
>>>
>>> Test code
>>> ---------------------------------------------------------------
>>> #include <stdio.h>
>>> struct S114 {
>>> char a[67];
>>> }a114[5];
>>> struct S114 check114 (struct S114 arg0, struct S114* arg1) {
>>> if(&a114[0] != arg1) // arg1 value is wrong
>>> printf( "values %p, %p\n", &a114[0], arg1);
>>> }
>>> int main () {
>>> int i= 0, j = 0;
>>> for (;j<2; j++) // just filling a114 with some values for identification
>>> for(i=0; i<sizeof(struct S114); i++)
>>> memset(&a114[j].a[i],(0x11+i+j*30), sizeof(int));
>>> check114 (a114[1], &a114[0]); //=> a114[0] is accessed from wrong location inside check114 function
>>> }
>>> ---------------------------------------------------------------
>>> clang -v
>>> clang version 3.3 (tags/RELEASE_33/final)
>>> Target: i386-pc-linux-gnu
>>> Thread model: posix
>>>
>>> Output on arm :
>>> # ./check114.exe
>>> values 0x10861, 0x4071706f
>>> which is wrong.
>>>
>>> Assembly for check114
>>> ---------------------------------------------------------------
>>> sub sp, sp, #16
>>> push {r11, lr}
>>> mov r11, sp
>>> sub sp, sp, #8
>>> str r3, [r11, #20]
>>> str r2, [r11, #16]
>>> str r1, [r11, #12]
>>> ldr r1, [r11, #76]
>>> str r1, [sp, #4]
>>> .loc 1 7 0 prologue_end
>>> ldr r2, .LCPI0_0
>>> cmp r2, r1
>>> beq .LBB0_2
>>> b .LBB0_1
>>> ---------------------------------------------------------------
>>>
>>> From reg, stack dump:
>>> ------------------------------------------------------------------------------------------------------------------------------
>>> @entry of check114
>>> => 0x8398 <check114>: sub sp, sp, #16
>>> 0x839c <check114+4>: push {r11, lr}
>>> sp 0xbefff808 0xbefff808
>>> @if condition
>>> 0x83b4 <check114+28>: ldr r1, [r11, #76] ; 0x4c <--- wrong value copied to r1, offset #76 should be #80
>>> 0x83b8 <check114+32>: str r1, [sp, #4]
>>> 0x83bc <check114+36>: ldr r2, [pc, #44] ; 0x83f0 <check114+88>
>>> => 0x83c0 <check114+40>: cmp r2, r1
>>>
>>> r11 0xbefff7f0 -1090521104
>>> sp 0xbefff7e8 0xbefff7e8
>>> Stack dump:
>>> 0xbefff7e4: 0x4001ed08 0x40024f90 0x4071706f 0xbefff8a0
>>> 0xbefff7f4: 0x0000869c 0x00000000 0x3231302f 0x36353433
>>> 0xbefff804: 0x3a393837 0x3e3d3c3b 0x4241403f 0x46454443
>>> 0xbefff814: 0x4a494847 0x4e4d4c4b 0x5251504f 0x56555453
>>> 0xbefff824: 0x5a595857 0x5e5d5c5b 0x6261605f 0x66656463
>>> 0xbefff834: 0x6a696867 0x6e6d6c6b 0x4071706f 0x00010861 //[R11+4c] -> [0xbefff7f0+4c] -> [0xbefff83c] -> 0x4071706f
>>> Correct value is at location {[R11+4c]+4} --> 0x00010861, 4 bytes offset going wrong.
>>> ------------------------------------------------------------------------------------------------------------------------------
>>> When i checked from the ARM Lowering part for generation of
>>> sub sp, sp, #16
>>> Emitted by,
>>> if (VARegSaveSize)
>>> emitSPUpdate(isARM, MBB, MBBI, dl, TII, -VARegSaveSize, // --> VARegSaveSize is calculated in computeRegArea
>>> MachineInstr::FrameSetup)
>>>
>>> ARMTargetLowering::computeRegArea(..) {
>>> ...
>>> VARegSize = NumGPRs * 4;
>>> VARegSaveSize = (VARegSize + Align - 1) & ~(Align - 1); // --> 8 byte alignment done here
>>> }
>>> Stack pointer decremented to NumGPRs*4 + alignment
>>> NumGPRs = 3 registers
>>> VARegSaveSize = 16 (after considering 8 byte alignment )
>>>
>>> When the offset(#76) for the instruction, "ldr r1, [r11, #76] ; 0x4c" is calculated, 4 bytes alignment is considered.
>>> In prologue stackpointer calculation 8 byte alignment is considered.
>>> Due to this mimatch of alignment, If try to access any parameter after byval which results wrong value.
>>>
>>> Issue(or offset of 4 bytes) wont occur if even number of register used for byval spilling.
>>> ex:
>>> struct S114 check114 (int a, struct S114 arg0, struct S114* arg1) { // accessing arg1 is fine in this case
>>> .....
>>> }
>>>
>>> Could someone comment on below queries about fixing the problem,
>>> 1) Is this 8 byte alignment mandatory ? Is this due to " ARM AAPCS 5.2.1.2 Stack constraints at a public interface" ? Can this be removed?
>>> 2) We will leave alignment as it is but in prologue we will adjust SP once again, this is little meaningless.
>>> 3) While accessing arg1 we will consider alignment and add extra offset -> looks better.
>>> Offset to access arg1 is calculated by selection DAG that will be target independent. But Alignment adjustment should be done by target lowering. Any suggestions on how to fix this ?
>>>
>>> Regards,
>>> Rajesh
>>>
>>> <201306181656803_BEI0XT4N.gif>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>> ,
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
>
>
>
> <201306191201558_BEI0XT4N.gif>
> <check114.s>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130619/5449b9a6/attachment.html>
More information about the llvm-dev
mailing list