[PATCH][AArch64] implement aarch64 neon instruction class AdvSIMD (shift)

Fri Aug 30 02:31:53 PDT 2013

Hi Hao,

Thanks for working on these.

> Attached are patches to implement aarch64 neon instruction class AdvSIMD
> (shift), which patches implemented 21 shift instructions and 4 convert
> instructions. Most of them are implemented like ARMv7, except :

I'm not convinced ARM does this a good way. It looks like Clang sees

    int32x2 vrshr_n_s32(int32x2_t a, int32_t amt)

converts it to:

    <2 x i32> @llvm.arm.neon.vrshifts.v2i32(<2 x i32> %a, <2 x i32>
<i32 %amt, i32 %amt>)

The backend then pattern-matches the second operand and maps the whole
thing back to a specialised AArch64ISD node very much like clang saw,
which gets selected. This makes sense to me if there's also a register
form and the instruction with an immediate is just an optimisation for
when that shift amount is a known constant, but isn't that only the
case for UQSHL and SQSHL?

The others seem to be immediate-only instructions, so wouldn't it make
sense for Clang to produce:

   <2 x i32> @llvm.aarch64.neon.srshr.v2i32(<2 x i32> %a, i32 12)

Then there would be no need for any special handling in
AArch64ISelLowering.cpp and the intrinsic could be matched directly in
TableGen:

def : Pat<(int_aarch64_neon_srshr v2i32:$Rn, imm1_32:$imm),
          (SRSHRvvi_2s v2i32:$Rn, imm1_32:$imm)>;

It looks like the only shifts this doesn't apply to is UQSHL/SQSHL
(which do have a register form as well).

> 1)  SHRN is implemented by IR (lshr/ashr, tuncate) instead of IR intrinsics.

> 2)  There are some special instructions added in AArch64: shift narrow high,
> which is implemented by combining shuffle vector and normal shift narrow
> instructions.

That's good. I like those implementations.

Cheers.

Tim.