[PATCH][AArch64] implement aarch64 neon instruction class AdvSIMD (shift)
Hao Liu
Hao.Liu at arm.com
Mon Sep 2 07:03:30 PDT 2013
Hi Tim,
Sorry, wrong patches and cc addresses.
Send with new patches again.
Thanks,
-Hao
-----Original Message-----
From: Hao Liu [mailto:Hao.Liu at arm.com]
Sent: Monday, September 02, 2013 2:43 PM
To: 'Tim Northover'
Subject: RE: [PATCH][AArch64] implement aarch64 neon instruction class
AdvSIMD (shift)
Hi Tim,
That's reasonable.
I've refactored my patches in the attachement.
Thanks,
-Hao
-----Original Message-----
From: Tim Northover [mailto:t.p.northover at gmail.com]
Sent: Friday, August 30, 2013 10:32 AM
To: Hao Liu
Cc: llvm-commits; cfe-commits at cs.uiuc.edu
Subject: Re: [PATCH][AArch64] implement aarch64 neon instruction class
AdvSIMD (shift)
Hi Hao,
Thanks for working on these.
> Attached are patches to implement aarch64 neon instruction class
> AdvSIMD (shift), which patches implemented 21 shift instructions and 4
> convert instructions. Most of them are implemented like ARMv7, except :
I'm not convinced ARM does this a good way. It looks like Clang sees
int32x2 vrshr_n_s32(int32x2_t a, int32_t amt)
converts it to:
<2 x i32> @llvm.arm.neon.vrshifts.v2i32(<2 x i32> %a, <2 x i32>
<i32 %amt, i32 %amt>)
The backend then pattern-matches the second operand and maps the whole thing
back to a specialised AArch64ISD node very much like clang saw, which gets
selected. This makes sense to me if there's also a register form and the
instruction with an immediate is just an optimisation for when that shift
amount is a known constant, but isn't that only the case for UQSHL and
SQSHL?
The others seem to be immediate-only instructions, so wouldn't it make sense
for Clang to produce:
<2 x i32> @llvm.aarch64.neon.srshr.v2i32(<2 x i32> %a, i32 12)
Then there would be no need for any special handling in
AArch64ISelLowering.cpp and the intrinsic could be matched directly in
TableGen:
def : Pat<(int_aarch64_neon_srshr v2i32:$Rn, imm1_32:$imm),
(SRSHRvvi_2s v2i32:$Rn, imm1_32:$imm)>;
It looks like the only shifts this doesn't apply to is UQSHL/SQSHL (which do
have a register form as well).
> 1) SHRN is implemented by IR (lshr/ashr, tuncate) instead of IR
intrinsics.
> 2) There are some special instructions added in AArch64: shift narrow
> high, which is implemented by combining shuffle vector and normal
> shift narrow instructions.
That's good. I like those implementations.
Cheers.
Tim.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: clang-simd-shift-v1.patch
Type: application/octet-stream
Size: 44878 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130902/2c86ca2c/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: llvm-simd-shift-v1.patch
Type: application/octet-stream
Size: 264201 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130902/2c86ca2c/attachment-0001.obj>
More information about the cfe-commits
mailing list