patches with initial implementation of Neon scalar instructions

Thu Sep 12 14:53:31 PDT 2013

On Sep 11, 2013, at 7:35 AM, Tim Northover <t.p.northover at gmail.com> wrote:

> Hi Ana,
> 
>>      Because the Clang builtins use scalar types, if in CGBuiltin I
>> transform the code below into an IR operation using CreateAdd, the code is
>> optimized and results in the machine instruction ‘add X0, X1, X0’, which is
>> not what we want.
> 
> I wonder if that's actually true. Realistically, the function you
> wrote *is* better implemented with "add x0, x1, x0", instead of three
> fmovs and an  "add d0, d1, d0".
> 
> If you put a call to your vaddd_s64 back into a "vector" context,
> where it *does* make sense to use the "add d0, d1, d0" version then I
> think LLVM will get it right again:
> 
> int64x1_t my_own_little_function(int64x1_t a, int64x1_t b) {
>  return vaddd_s64((int64_t)a, (int64_t)b);
> }
> 
> After inlining I'd expect the optimised IR to still contain an "add <1
> x i64>" here and the assembly to use the "add d0, d1, d0" form (in
> this case faster than 3 fmovs and an "add x0, x1, x0").
> 
> Obviously LLVM isn't perfect at spotting these contexts yet, but I
> don't think we should be hobbling it by insisting on a SISD add just
> because that's what the intrinsic notionally maps to.
> 

This is very important. LLVM expresses the semantic intent of the intrinsic, and allow the backend to select the appropriate instruction to execute that intent. There is no guarantee that an intrinsic will map to a specific instruction, nor should there be.

Right now, we get this right a fair bit of the time in similar situations, and are getting better all the time. It takes a fair bit of custom lowering and such, however, due to sdag's insistence on using the node type to map to register classes. This is a problem that will be solved at the root level by the global isel project in the future.

-Jim