patches with initial implementation of Neon scalar instructions

Wed Sep 11 07:35:16 PDT 2013

Hi Ana,

>       Because the Clang builtins use scalar types, if in CGBuiltin I
> transform the code below into an IR operation using CreateAdd, the code is
> optimized and results in the machine instruction ‘add X0, X1, X0’, which is
> not what we want.

I wonder if that's actually true. Realistically, the function you
wrote *is* better implemented with "add x0, x1, x0", instead of three
fmovs and an  "add d0, d1, d0".

If you put a call to your vaddd_s64 back into a "vector" context,
where it *does* make sense to use the "add d0, d1, d0" version then I
think LLVM will get it right again:

int64x1_t my_own_little_function(int64x1_t a, int64x1_t b) {
  return vaddd_s64((int64_t)a, (int64_t)b);
}

After inlining I'd expect the optimised IR to still contain an "add <1
x i64>" here and the assembly to use the "add d0, d1, d0" form (in
this case faster than 3 fmovs and an "add x0, x1, x0").

Obviously LLVM isn't perfect at spotting these contexts yet, but I
don't think we should be hobbling it by insisting on a SISD add just
because that's what the intrinsic notionally maps to.

Cheers.

Tim.