[LLVMdev] Unaligned vector memory access for ARM/NEON.

Peter Couperus peter.couperus at st.com
Wed Sep 5 16:25:18 PDT 2012


Hello Jim,

Thank you for the response.  I may be confused about the alignment rules 
here.
I had been looking at the ARM RVCT Assembler Guide, which seems to 
indicate vld1.16 operates on 16-bit aligned data, unless I am 
misinterpreting their table
(Table 5-11 in ARM DUI 0204H, pg 5-70,5-71).
Prior to the table, It does mention the accesses need to be "element" 
aligned, where I took element in this case to mean i16.

Anyhow, to make this a little more concrete:

void extend(short* a, int* b) {
   for(int i = 0; i < 8; i++)
     b[i] = (int)a[i];
}

When I compile this program with clang -O3 -ccc-host-triple 
armv7-none-linux-gnueabi -mfpu=neon -mllvm -vectorize, the intermediate 
LLVM assembly
looks OK (and it has an align 2 vector load), but the generated ARM 
assembly has the scalar loads.
When I compile with (4.6) gcc -std=c99 -ftree-vectorize -marm -mfpu=neon 
-O3, it uses vld1.16 and vst1.32 regardless of the parameter alignment.  
This is on armv7a.

The gcc version (and the clang version with our modified backend) runs 
fine, even on 2-byte aligned data.  Is this not a guarantee across 
armv7/armv7a generally?

Pete




On 09/05/2012 03:15 PM, Jim Grosbach wrote:
> VLD1 expects a 64-bit aligned address unless the target explicitly days that unaligned loads are OK.
>
> For your situation, either the subtarget should set AllowsUnalignedMem to true (if that's accurate), or the load address should be made 64-bit aligned.
>
> -Jim
>
> On Sep 5, 2012, at 2:42 PM, Peter Couperus<peter.couperus at st.com>  wrote:
>
>> Hello all,
>>
>> I am a first time writer here, but am a happy LLVM tinkerer.  It is a pleasure to use :).
>> We have come across some sub-optimal behavior when LLVM lowers loads for vectors with small integers, i.e. load<4 x i16>* %a, align 2,
>> using a sequence of scalar loads rather than a single vld1 on armv7 linux with NEON.
>> Looking at the code in svn, it appears the ARM backend is capable of lowering these loads as desired, and will if we use an appropriate darwin triple.
>> It appears this was actually enabled relatively recently.
>> Seemingly, the case where the Subtarget has NEON available should be handled the same on Darwin and Linux.
>> Is this true, or am I missing something?
>> Do the regulars have an opinion on the best way to handle this?
>> Thanks!
>>
>> Pete
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-------------- next part --------------
A non-text attachment was scrubbed...
Name: extend.c
Type: text/x-csrc
Size: 92 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120905/3e81319f/attachment.c>


More information about the llvm-dev mailing list