[LLVMdev] question about alignment of structures on the stack (arm 32)

Tue Apr 21 08:54:03 PDT 2015

Hello Tim, thanks for response

----------------------------------------
> Date: Mon, 20 Apr 2015 11:45:03 -0700
> Subject: Re: [LLVMdev] question about alignment of structures on the stack (arm 32)
> From: t.p.northover at gmail.com
> To: alexey.perevalov at hotmail.com
> CC: llvmdev at cs.uiuc.edu
>
> On 20 April 2015 at 11:09, Alexey Perevalov
> <alexey.perevalov at hotmail.com> wrote:
>> And before printf call I see an argument preparation, and one of the most interesting instruction
>>
>> orr r3, r2, #4 ;for address of range.length
>
> This is certainly odd, and I can't reproduce the behaviour here. Even
> if the stack itself is 8-byte aligned (it's not on iOS), that struct
> would usually only be 4-byte aligned. LLVM shouldn't be using "orr"
> there.

Yes, you're right, it's odd ).

Sorry I didn't clearly described my environment.
I'm using MachO loader (https://github.com/LubosD/darling/). I'm trying to make it work on ARM.
The scenario is to load MachO binary (e.g. compiled in xCode) that binary is invoking function from
ELF library which implements libobjc2 and CoreFoundation.

 in MachO on the ARM stack is 4-bytes aligned. Code produced for ELF expects 8-bytes alignment.
So in 50% cases when call made from MachO to ELF stack pointer register contains not a 8-bytes aligned address.
Even in case of trivial call
NSLog(@"Test string") from MachO
it leads to -[NSString getCharacters:]
------
-(void)getCharacters:(unichar *)unicode {
   NSRange range={0,[self length]};
   [self getCharacters:unicode range:range];
}

------
when "range" is copying by value, and second field of "range" is evaluated incorrectly,
its address evaluated as address of the structure itself.
because of orr r3, r2, #4,

The minimum example I think is:
#include <stdio.h>

typedef struct
{
    int a;
    char b;
} MyStruct;

int main(void) {
    MyStruct mStruct = {11, 100};
    printf("%p, %p\n", &mStruct.a, &mStruct.b);
    return 0;
}
compile it by clang
----
clang version 3.3 (tags/RELEASE_33/final)
Target: armv7l-unknown-linux-gnueabi
Thread model: posix
-----
And we get following code of assembler language:
main:
    push    {r11, lr}
    mov    r11, sp
    sub    sp, sp, #24
    mov    r0, #0
    str    r0, [r11, #-4]
    add    r1, sp, #8
    movw    r2, :lower16:.Lmain.mStruct
    movt    r2, :upper16:.Lmain.mStruct
    vldr    d16, [r2]
    vstr    d16, [sp, #8]
    orr    r2, r1, #4
    movw    r3, :lower16:.L.str
    movt    r3, :upper16:.L.str
    str    r0, [sp, #4]
    mov    r0, r3
    bl    printf
    ldr    r1, [sp, #4]
    str    r0, [sp]
    mov    r0, r1
    mov    sp, r11
    pop    {r11, pc}

r2 populates by r1 plus 4 (but plus here is optimized). I think you know it better than me ;)
And if address of mStruct mod 4 = 0 and != mod 8, I got r2 the same as r1.

Due I can't modify MachO binaries, I'm looking for a way to avoid orr and use add instruction here.
Maybe it will not solve all of my problems due difference in ABI, I suggest it's the easiest way.
I found -mstack-alignment= options, and I tried 4 value there for ELF build, but orr still used. BTW for x86_64 it worked, both on linux and mac.

Another way, I think, it's make realignment inside all of ELF function, here could be a performance penalty, I tried -mstackrealign, but it wasn't lead to 8-bytes aligned stack, I mean sp wasn't aligned to 0x.....0/8,
as well as address of structure on the stack. Also I tried -mstrict-align.
So I assume, somewhere should be patches for llvm, which could do it )

I not yet tested some __attribute__((pcs("aapcs")))/-target-abi, maybe there is magic pcs attribute, and I could apply it for dangerous function, but I would prefer to solve that problem in general.

>
> Do you have a self-contained example (code, compiler version & command
> line flags)?
>
> Cheers.
>
> Tim.

Best regards,

Alexey