[PATCH][AArch64] implement 3 aarch64 neon instrunctions (umov smov ins) in llvm

Thu Sep 12 02:17:25 PDT 2013

Hi Tim,

I just finished the 3rd edition of this implementation. The changes are:

*Custom promote all 64 bit vector to 128 bit for EXTRACT_VECTOR_ELT and
INSERT_VECTOR_ELT. ( the implementation maybe ugly, but currently I have no
idea to get a better one. )
*Remove all 64 bit instruction format.
*Use common nodes vector_extract and vector_insert instead of defining new
dag node.

Most of your suggestions are implemented in this patch, leaving only one
question:

>>>+      if(getSubTarget().hasNEON())
>>>+        [...]
>>>+      else
>>>
>>>What's the advantage of emitting  a different instruction if the core
>>>has NEON here? The normal FP one should work regardless (and may
>>>actually be faster since the lane doesn't have to be decoded)
>>
>> We think if someone move data bewteen GPR and FPR on the device with
NEON,
>> he most likely to use some SISD instructions  after or before movement.
>> Considering FPU and NEON are different hard cores, mixing FPU instruction
>> with NEON instruction may get worse performance.
>
>That's happened historically (Cortex-A9 springs to mind?) but I don't
>believe it's a significant issue on either Cortex-A15 or Swift, let
>alone any AArch64 chips that may be coming.
>
>Quite apart from that, your predicate isn't based on whether NEON or
>FP is actually being used, but on whether the CPU supports it, which
>is a completely different question.

I agree that we can make better choice according to the context, but it's
hard to implement this at moment as lots of NEON instructions have not been
implemented yet. Also we need more performance tuning result to tell us how
to make choice. So maybe we can improve this in the future.

2013/9/12 Tim Northover <t.p.northover at gmail.com>

> Hi Jiangning,
>
> > I think you are meaning VPR128->FPR128->VPR64 is ugly.
>
> Oh yes, I'd forgotten about that. In fact now that you've reminded me
> it probably wouldn't work anyway.
>
> > So do you mean we should completely remove register class VPR64 in
> > AArch64RegisterInfo.td? And then we should define a new operand type
> derived
> > from RegisterOperand to describe 64-bit vector register operand in
> patterns
> > instead?
>
> Pretty much. A new RegisterOperand for both VPR128 and VPR64, aliasing
> them to FPR128 and FPR64 respectively.
>
> I've attached a (completely broken, but compiles) patch showing what I
> mean. AsmParser fought back so I couldn't get it anywhere near working
> this morning. I'll see what I can do to improve it until it actually
> works.
>
> Cheers.
>
> Tim.
>

-- 
Best Regards,

Kevin Qin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130912/bcf7825f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: copy3rd.patch
Type: application/octet-stream
Size: 34286 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130912/bcf7825f/attachment.obj>