[PATCH][AArch64]RE: patches with initial implementation of Neon scalar instructions

Wed Sep 11 23:32:49 PDT 2013

Hi Ana,

> I was just pointing out that if I define the ARMv8 intrinsic using the legacy ARMv7 intrinsic produces code like this:
>
>                (int64_t) vadd_s64((int64x1_t(a), int64x1_t(b))
>
> which results in "add x0, x1, x0".

Yep, though not in all cases.

> Now we need to confirm what is the expected implementation for a Neon
> intrinisic - to produce only Neon code or produce best code possible?

That's definitely the question (or something very close). I think it
should be the latter. Otherwise people had just as well be using
inline assembly.

Think about some more extreme cases: if someone writes vadd(a,
vmul_f32(b, c)) should we be forced to emit two instructions rather
than a (non-fused) vmla.f32?

Or what if someone writes a loop that we can remove completely. Should
we blindly emit it because they asked for a bunch of NEON
instructions?

And if you allow LLVM to optimise those examples, the question becomes
where to draw the line. The only sensible answer (I think) is "when
LLVM thinks it'll make the code better".

> The spreadsheet I have with AArch64 intrinsics definitions shows
> Neon instruction is expected:

I view the spreadsheet as providing semantics.

Yes these are NEON intrinsics, so they're going to provide at least
one way of producing the actions of every instruction. And of course
they're going to tell you what the effect should be in terms of those
NEON instructions. It's the easiest way.

Anyway, those are just my views on the topic.

Tim.