[LLVMdev] NEON intrinsics preventing redundant load optimization?

James Molloy james at jamesmolloy.co.uk
Mon Jan 5 04:13:00 PST 2015


Hi all,

Sorry for arriving late to the party. First, some context:

vld1 is not the same as a pointer dereference. The alignment requirements
are different (which I saw you hacked around in your testcase using
attribute((aligned(4))) ), and in big endian environments they do totally
different things (VLD1 does element-wise byteswapping and pointer
dereferences byteswaps the entire 128-bit number).

While pointer dereference does work just as well (and better, given this
defect) as VLD1 it is explicitly *not supported*. The ACLE mandates that
there are only certain ways to legitimately "create" a vector object -
vcreate, vcombine, vreinterpret and vload. NEON intrinsic types don't exist
in memory (memory is modelled as a sequence of scalars, as in the C model).
For this reason Renato I don't think we should advise people to work around
the API, as who knows what problems that will cause later.

The reason above is why we map a vloadq_f32() into a NEON intrinsic instead
of a generic IR load. Looking at your testcase, even with tip-of-trunk
clang we generate redundant loads and stores:

vld1.32 {d16, d17}, [r1]
vld1.32 {d18, d19}, [r0]
mov r0, sp
vmul.f32 q8, q9, q8
vst1.32 {d16, d17}, [r0]
vld1.64 {d16, d17}, [r0:128]
vst1.32 {d16, d17}, [r2]

Whereas for AArch64, we don't (and neither do we for the chained multiply
case):

ldr q0, [x0]
ldr q1, [x1]
fmul v0.4s, v0.4s, v1.4s
str q0, [x2]
ret

So this is handled, and I think there's something wrong/missing in the
optimizer for AArch32. This is a legitimate bug and should be fixed (even
if a workaround is required in the interim!)

Cheers,

James

On Mon Jan 05 2015 at 10:46:10 AM Renato Golin <renato.golin at linaro.org>
wrote:

> On 5 January 2015 at 10:14, Simon Taylor <simontaylor1 at ntlworld.com>
> wrote:
> > I don’t recall seeing anything about pointer dereferencing, but it may
> have the same issues. I’m a bit hazy on endianness issues with NEON anyway
> (in terms of element numbering, casts between types, etc) but it seems like
> all the smartphone platform ABIs are defined to be little-endian so I
> haven’t spent too much time worrying about it.
>
> Tim is right, this can be a potential danger, but not more than other
> endian or type size issues. If you're writing portable code, I assume
> you'll already be mindful of those issues.
>
> This is why I said it's still a problem, but not a critical one. Maybe
> adding a comment to your code explaining the issue will help you in
> the future to move it back to NEON loads/stores once this is fixed.
>
> cheers,
> --renato
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150105/e7ebf1d6/attachment.html>


More information about the llvm-dev mailing list