[LLVMdev] Unaligned vector memory access for ARM/NEON.

Wed Sep 5 17:03:56 PDT 2012

On Sep 5, 2012, at 4:58 PM, Jim Grosbach <grosbach at apple.com> wrote:

> Hmmm. Well, it's entirely possible that it's LLVM that's confused about the alignment requirements here. :)
> 
> I think I see, in general, where. I twiddled the IR to give it higher alignment (16 bytes) and get:
> extend:                                 @ @extend
> @ BB#0:
> 	vldr	d16, [r0]
> 	vmovl.s16	q8, d16
> 	vstmia	r1, {d16, d17}
> 	vldr	d16, [r0, #8]
> 	add	r0, r1, #16
> 	vmovl.s16	q8, d16
> 	vstmia	r0, {d16, d17}
> 	bx	lr
> 
> Note that we're using a plain vldr instruction here to load the d register, not a vld1 instruction. Similarly for the stores. According to the ARM ARM (DDI 0406C),  you're correct about the element size alignment requirement for VLD1, but our isel isn't attempting to use that instruction, but rather VLDR, which has word alignment required, so it falls over.
> 
> Given that, it seems that the answer to your original question is that to improve codegen for this case, the proper place to look is in instruction selection for loads and stores to the VFP/NEON registers. That code can be made smarter to better use the NEON instructions. I know Jakob has done some work related to better utilization of those for other constructs.

I don't think isel ever uses vld1.16, but I don't see anything wrong with it for 2-byte aligned vectors.

There is an issue with big-endian semantics, but I don't think we're seriously trying to support big-endian ARM?

/jakob