[PATCH][AArch64] implement 3 aarch64 neon instrunctions (umov smov ins) in llvm

Tue Sep 10 03:34:39 PDT 2013

Hi Tim,

Thanks for your feedback. Although there are still more work to do, I have
to post this patch to support SISD implementation by others.

I improved my implementation about:
* Change umov and smov combination to common pattern.
* Use getMatchingSuperReg() instead of enum calculation.
* Change neon_uimm0_chomp_hash and its friends to neon_uimm0_bare like.
* Slim MC codes by using template.

For your question,

>+      if(getSubTarget().hasNEON())
>+        [...]
>+      else
>
>What's the advantage of emitting  a different instruction if the core
>has NEON here? The normal FP one should work regardless (and may
>actually be faster since the lane doesn't have to be decoded)

We think if someone move data bewteen GPR and FPR on the device with NEON,
he most likely to use some SISD instructions  after or before movement.
Considering FPU and NEON are different hard cores, mixing FPU instruction
with NEON instruction may get worse performance. If you have any solid
evidence retort this, please show us.

>+def Neon_vector_extract : SDNode<"ISD::EXTRACT_VECTOR_ELT",
SDT_Neon_mov_lane>;
>+def Neon_vector_insert : SDNode<"ISD::INSERT_VECTOR_ELT",
SDTypeProfile<1, 3,
>+                           [SDTCisVec<0>, SDTCisSameAs<0, 1>,
>+                           SDTCisInt<2>, SDTCisVT<3, i32>]>>;
>
>What's wrong with the existing extractelt and insertelt?

extractelt is defined as the output holds the same value type of vector
element, but smov and umov can extract the element and extend it
simultaneously.
Also,I try to use vector_extract . But its last parameter is defined as a
pointer, which is presented as a i64 constant in aarch64. This will cause
pattern match fail because all immediate numbers are defined as i32. So I
have to create new operator with right parameter value type.

>+               let Inst{14-12} = {Immn{2}, Immn{1}, Immn{0}};
>
>What's bit 11? Same for later ones. If it's set elsewhere, a comment
>would be useful. If it's not then setting it is probably essential.

Because bit 11 is unspecified, any value is ok.

>+    def _16B16 : NeonI_insert<0b1, 0b1,
>+    [...]
>+    def _16B8 : NeonI_insert<0b1, 0b1,
>
>Why are these separate instructions? It seems to be some attempt to
>model the VPR64/VPR128 distinction, but I don't think it actually
>gains us anything. It's not like the high part of VPR128 can be used
>independently really.

I am working on removing all VPR64 instrcution format and custom promote
all 64 bit vector to 128 bit. I will add this modification in next edition.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130910/05ce366c/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: copy2th.patch
Type: application/octet-stream
Size: 35894 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130910/05ce366c/attachment.obj>