[PATCH][AArch64] implement aarch64 neon instruction class AdvSIMD (shift)

Mon Sep 2 07:03:30 PDT 2013

Hi Tim,

Sorry, wrong patches and cc addresses.
Send with new patches again.

Thanks,
-Hao

-----Original Message-----
From: Hao Liu [mailto:Hao.Liu at arm.com] 
Sent: Monday, September 02, 2013 2:43 PM
To: 'Tim Northover'
Subject: RE: [PATCH][AArch64] implement aarch64 neon instruction class
AdvSIMD (shift)

Hi Tim,

That's reasonable. 

I've refactored my patches in the attachement.

Thanks,
-Hao

-----Original Message-----
From: Tim Northover [mailto:t.p.northover at gmail.com]
Sent: Friday, August 30, 2013 10:32 AM
To: Hao Liu
Cc: llvm-commits; cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH][AArch64] implement aarch64 neon instruction class
AdvSIMD (shift)

Hi Hao,

Thanks for working on these.

> Attached are patches to implement aarch64 neon instruction class 
> AdvSIMD (shift), which patches implemented 21 shift instructions and 4 
> convert instructions. Most of them are implemented like ARMv7, except :

I'm not convinced ARM does this a good way. It looks like Clang sees

    int32x2 vrshr_n_s32(int32x2_t a, int32_t amt)

converts it to:

    <2 x i32> @llvm.arm.neon.vrshifts.v2i32(<2 x i32> %a, <2 x i32>
<i32 %amt, i32 %amt>)

The backend then pattern-matches the second operand and maps the whole thing
back to a specialised AArch64ISD node very much like clang saw, which gets
selected. This makes sense to me if there's also a register form and the
instruction with an immediate is just an optimisation for when that shift
amount is a known constant, but isn't that only the case for UQSHL and
SQSHL?

The others seem to be immediate-only instructions, so wouldn't it make sense
for Clang to produce:

   <2 x i32> @llvm.aarch64.neon.srshr.v2i32(<2 x i32> %a, i32 12)

Then there would be no need for any special handling in
AArch64ISelLowering.cpp and the intrinsic could be matched directly in
TableGen:

def : Pat<(int_aarch64_neon_srshr v2i32:$Rn, imm1_32:$imm),
          (SRSHRvvi_2s v2i32:$Rn, imm1_32:$imm)>;

It looks like the only shifts this doesn't apply to is UQSHL/SQSHL (which do
have a register form as well).

> 1)  SHRN is implemented by IR (lshr/ashr, tuncate) instead of IR
intrinsics.

> 2)  There are some special instructions added in AArch64: shift narrow 
> high, which is implemented by combining shuffle vector and normal 
> shift narrow instructions.

That's good. I like those implementations.

Cheers.

Tim.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clang-simd-shift-v1.patch
Type: application/octet-stream
Size: 44878 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130902/2c86ca2c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-simd-shift-v1.patch
Type: application/octet-stream
Size: 264201 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130902/2c86ca2c/attachment-0001.obj>