patches with initial implementation of Neon scalar instructions
Tim Northover
t.p.northover at gmail.com
Wed Sep 11 07:35:16 PDT 2013
Hi Ana,
> Because the Clang builtins use scalar types, if in CGBuiltin I
> transform the code below into an IR operation using CreateAdd, the code is
> optimized and results in the machine instruction ‘add X0, X1, X0’, which is
> not what we want.
I wonder if that's actually true. Realistically, the function you
wrote *is* better implemented with "add x0, x1, x0", instead of three
fmovs and an "add d0, d1, d0".
If you put a call to your vaddd_s64 back into a "vector" context,
where it *does* make sense to use the "add d0, d1, d0" version then I
think LLVM will get it right again:
int64x1_t my_own_little_function(int64x1_t a, int64x1_t b) {
return vaddd_s64((int64_t)a, (int64_t)b);
}
After inlining I'd expect the optimised IR to still contain an "add <1
x i64>" here and the assembly to use the "add d0, d1, d0" form (in
this case faster than 3 fmovs and an "add x0, x1, x0").
Obviously LLVM isn't perfect at spotting these contexts yet, but I
don't think we should be hobbling it by insisting on a SISD add just
because that's what the intrinsic notionally maps to.
Cheers.
Tim.
More information about the cfe-commits
mailing list