[PATCH][AArch64] implement 3 aarch64 neon instrunctions (umov smov ins) in llvm

Wed Sep 11 01:16:24 PDT 2013

Hi Tim,

I am working on removing all VPR64 instruction format , but facing a
problem. I wish to extend 64 bit vector(eg. v8i8) to 128 bit (v16i8) before
insert/extract, then I can use 128 bit insert/extract to process it. I
tried SUBREG_TO_REG and INSERT_SUBREG to get this promotion, but all got an
assertion failure:

llc:
/home/kevin/llvm_trunk/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp:534:
void llvm::InstrEmitter::EmitSubregNode(llvm::SDNode*,
llvm::DenseMap<llvm::SDValue, unsigned int>&, bool, bool): Assertion `SRC
&& "No register class supports VT and SubIdx for INSERT_SUBREG"' failed.

I see in AArch64RegisterInfo.td, Vx is defined as subreg of Qx, but there
is no belonging relation describe between VPR64 and VPR128. How can I
define VPR64 is subreg of VPR128 while both of them hold same register name
Vx?

Or, Do you know any dag node can do such promotion without introducing
instruction?

Thanks.

2013/9/10 Tim Northover <t.p.northover at gmail.com>

> Hi Kevin,
>
> Thanks very much for keeping on working on this, I quite understand
> the need to share progress so others don't get blocked.
>
> >>+      if(getSubTarget().hasNEON())
> >>+        [...]
> >>+      else
> >>
> >>What's the advantage of emitting  a different instruction if the core
> >>has NEON here? The normal FP one should work regardless (and may
> >>actually be faster since the lane doesn't have to be decoded)
> >
> > We think if someone move data bewteen GPR and FPR on the device with
> NEON,
> > he most likely to use some SISD instructions  after or before movement.
> > Considering FPU and NEON are different hard cores, mixing FPU instruction
> > with NEON instruction may get worse performance.
>
> That's happened historically (Cortex-A9 springs to mind?) but I don't
> believe it's a significant issue on either Cortex-A15 or Swift, let
> alone any AArch64 chips that may be coming.
>
> Quite apart from that, your predicate isn't based on whether NEON or
> FP is actually being used, but on whether the CPU supports it, which
> is a completely different question.
>
> >>+def Neon_vector_extract : SDNode<"ISD::EXTRACT_VECTOR_ELT",
> >> SDT_Neon_mov_lane>;
> >>+def Neon_vector_insert : SDNode<"ISD::INSERT_VECTOR_ELT",
> SDTypeProfile<1,
> >> 3,
> >>+                           [SDTCisVec<0>, SDTCisSameAs<0, 1>,
> >>+                           SDTCisInt<2>, SDTCisVT<3, i32>]>>;
> >>
> >>What's wrong with the existing extractelt and insertelt?
> >
> > extractelt is defined as the output holds the same value type of vector
> > element, but smov and umov can extract the element and extend it
> > simultaneously.
>
> Quite possibly; in that case your patterns would match things like
> "(and (extractelt ...), 0xff)" and "(sext_inreg (extractelt ...),
> i8)". At least, that's what I see in my DAGs for code I've looked at.
>
> > Also,I try to use vector_extract . But its last parameter is defined as a
> > pointer, which is presented as a i64 constant in aarch64.
>
> But that's how the nodes are created too. The DAG *does* have an i64
> type in the lane slot when I view it. If your patterns work with those
> nodes, it's because you're lying to TableGen with the
> Neon_vector_extract definition (& insert).
>
> > This will cause
> > pattern match fail because all immediate numbers are defined as i32.
>
> They don't have to be. You can add i64 immediates to your instructions
> if it's more convenient. They get stored as int64_t anyway, and all
> type information is lost by the time we get to MachineInstrs.
>
> >>+               let Inst{14-12} = {Immn{2}, Immn{1}, Immn{0}};
> >>
> >>What's bit 11? Same for later ones. If it's set elsewhere, a comment
> >>would be useful. If it's not then setting it is probably essential.
> >
> > Because bit 11 is unspecified, any value is ok.
>
> Ah, didn't notice that bit. I thought ARM were trying to do away with
> such things (Grr!). Oh well. A comment would definitely be helpful
> here then.
>
> >>+    def _16B16 : NeonI_insert<0b1, 0b1,
> >>+    [...]
> >>+    def _16B8 : NeonI_insert<0b1, 0b1,
> >>
> >>Why are these separate instructions? It seems to be some attempt to
> >>model the VPR64/VPR128 distinction, but I don't think it actually
> >>gains us anything. It's not like the high part of VPR128 can be used
> >>independently really.
> >
> > I am working on removing all VPR64 instrcution format and custom promote
> all
> > 64 bit vector to 128 bit. I will add this modification in next edition.
>
> Thanks.
>
> Tim.
>

-- 
Best Regards,

Kevin Qin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130911/96fd21de/attachment.html>