[PATCH][AArch64] implement 3 aarch64 neon instrunctions (umov smov ins) in llvm

Tue Sep 10 06:07:06 PDT 2013

Hi Kevin,

Thanks very much for keeping on working on this, I quite understand
the need to share progress so others don't get blocked.

>>+      if(getSubTarget().hasNEON())
>>+        [...]
>>+      else
>>
>>What's the advantage of emitting  a different instruction if the core
>>has NEON here? The normal FP one should work regardless (and may
>>actually be faster since the lane doesn't have to be decoded)
>
> We think if someone move data bewteen GPR and FPR on the device with NEON,
> he most likely to use some SISD instructions  after or before movement.
> Considering FPU and NEON are different hard cores, mixing FPU instruction
> with NEON instruction may get worse performance.

That's happened historically (Cortex-A9 springs to mind?) but I don't
believe it's a significant issue on either Cortex-A15 or Swift, let
alone any AArch64 chips that may be coming.

Quite apart from that, your predicate isn't based on whether NEON or
FP is actually being used, but on whether the CPU supports it, which
is a completely different question.

>>+def Neon_vector_extract : SDNode<"ISD::EXTRACT_VECTOR_ELT",
>> SDT_Neon_mov_lane>;
>>+def Neon_vector_insert : SDNode<"ISD::INSERT_VECTOR_ELT", SDTypeProfile<1,
>> 3,
>>+                           [SDTCisVec<0>, SDTCisSameAs<0, 1>,
>>+                           SDTCisInt<2>, SDTCisVT<3, i32>]>>;
>>
>>What's wrong with the existing extractelt and insertelt?
>
> extractelt is defined as the output holds the same value type of vector
> element, but smov and umov can extract the element and extend it
> simultaneously.

Quite possibly; in that case your patterns would match things like
"(and (extractelt ...), 0xff)" and "(sext_inreg (extractelt ...),
i8)". At least, that's what I see in my DAGs for code I've looked at.

> Also,I try to use vector_extract . But its last parameter is defined as a
> pointer, which is presented as a i64 constant in aarch64.

But that's how the nodes are created too. The DAG *does* have an i64
type in the lane slot when I view it. If your patterns work with those
nodes, it's because you're lying to TableGen with the
Neon_vector_extract definition (& insert).

> This will cause
> pattern match fail because all immediate numbers are defined as i32.

They don't have to be. You can add i64 immediates to your instructions
if it's more convenient. They get stored as int64_t anyway, and all
type information is lost by the time we get to MachineInstrs.

>>+               let Inst{14-12} = {Immn{2}, Immn{1}, Immn{0}};
>>
>>What's bit 11? Same for later ones. If it's set elsewhere, a comment
>>would be useful. If it's not then setting it is probably essential.
>
> Because bit 11 is unspecified, any value is ok.

Ah, didn't notice that bit. I thought ARM were trying to do away with
such things (Grr!). Oh well. A comment would definitely be helpful
here then.

>>+    def _16B16 : NeonI_insert<0b1, 0b1,
>>+    [...]
>>+    def _16B8 : NeonI_insert<0b1, 0b1,
>>
>>Why are these separate instructions? It seems to be some attempt to
>>model the VPR64/VPR128 distinction, but I don't think it actually
>>gains us anything. It's not like the high part of VPR128 can be used
>>independently really.
>
> I am working on removing all VPR64 instrcution format and custom promote all
> 64 bit vector to 128 bit. I will add this modification in next edition.

Thanks.

Tim.