[PATCH][AArch64] updated patches with initial implementation of Neon scalar instructions

Wed Sep 18 07:12:39 PDT 2013

Hi Ana,

>   Tim, can you talk more about this upcoming LLVM change?

The most detailed information is in Jakob's RFC thread a little while
back: http://llvm.1065342.n5.nabble.com/global-isel-Proposal-for-a-global-instruction-selector-td60331.html

>         a) Will it still be SelectionDAG based?

No. One of the main goals is to get rid of SelectionDAG because of
various limitations and its complexity. No code's been written yet so
it's all very nebulous, but it may well still use most of the patterns
in .td files (as FastISel does, or in some improved fashion).

>         b) How having whole function knowledge will help me distinguish when
> to create Integer and scalar Neon operations without adding the v1x and v1f
> types?

The idea is that LLVM will have two (add i64:$Rn, i64:$Rm) patterns,
but distinguish them by the register bank they operate on.

It'll then look at the entire function and decide which register-bank
any given operation would be best in, (based on register-pressure,
available instructions, surrounding instructions etc). This would let
it pick the GPR64 or FPR64 "add" as appropriate.

>    Example:
> __ai int64_t vaddd_s64(int64_t __a, int64_t __b) {
>   return (int64_t)vadd_s64((int64x1_t)__a, (int64x1_t)__b); }
>
> Note that even with this change, the AArch64 intrinisc vaddd_s64 will NOT
> generate "add d0, d1, d0" but the optimized code "add x0, x1, x0" because of
> the castings to in64_t.

I see what you mean. @vaddd_s64 gets optimised to a simple "add i64"
and LLVM doesn't decide to undo that after it's been inlined into a
caller. I was sure I had tested that worked, but apparently not
properly.

The final IR is:

define <1 x i64> @my_own_little_function(<1 x i64> %a, <1 x i64> %b) #0 {
 %0 = extractelement <1 x i64> %a, i32 0
  %1 = extractelement <1 x i64> %b, i32 0
  %2 = add i64 %1, %0
  %3 = insertelement <1 x i64> undef, i64 %2, i32 0
  ret <1 x i64> %3
}

which is about as vectory as you can get except for that "add" in the
middle there.

I think I was wrong about the intrinsics here, and your first solution
was the best available. How easy would it be to add them back in?

> 4) Used FMOV instead of UMOV to move registers from Neon/integer units when
> possible

That sounds sensible.

Tim.