<div dir="ltr">Hi Tim,<div><br></div><div>I just finished the 3rd edition of this implementation. The changes are:</div><div><br></div><div>*Custom promote all 64 bit vector to 128 bit for EXTRACT_VECTOR_ELT and INSERT_VECTOR_ELT. ( the implementation maybe ugly, but currently I have no idea to get a better one. )</div>

<div>*Remove all 64 bit instruction format.</div><div>*Use common nodes vector_extract and vector_insert instead of defining new dag node.</div><div><br></div><div>Most of your suggestions are implemented in this patch, leaving only one question: </div>

<div><br></div>>>>+      if(getSubTarget().hasNEON())<br>>>>+        [...]<br>>>>+      else<br>>>><br>>>>What's the advantage of emitting  a different instruction if the core<br>

>>>has NEON here? The normal FP one should work regardless (and may<br>>>>actually be faster since the lane doesn't have to be decoded)<br>>><br>>> We think if someone move data bewteen GPR and FPR on the device with NEON,<br>

>> he most likely to use some SISD instructions  after or before movement.<br>>> Considering FPU and NEON are different hard cores, mixing FPU instruction<br>>> with NEON instruction may get worse performance.<br>

><br>>That's happened historically (Cortex-A9 springs to mind?) but I don't<br>>believe it's a significant issue on either Cortex-A15 or Swift, let<br>>alone any AArch64 chips that may be coming.<br>

><br>>Quite apart from that, your predicate isn't based on whether NEON or<br>>FP is actually being used, but on whether the CPU supports it, which<br>>is a completely different question.<div><br></div><div>

I agree that we can make better choice according to the context, but it's hard to implement this at moment as lots of NEON instructions have not been implemented yet. Also we need more performance tuning result to tell us how to make choice. So maybe we can improve this in the future.</div>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/9/12 Tim Northover <span dir="ltr"><<a href="mailto:t.p.northover@gmail.com" target="_blank">t.p.northover@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Hi Jiangning,<br>

<div class="im"><br>

> I think you are meaning VPR128->FPR128->VPR64 is ugly.<br>

<br>

</div>Oh yes, I'd forgotten about that. In fact now that you've reminded me<br>

it probably wouldn't work anyway.<br>

<div class="im"><br>

> So do you mean we should completely remove register class VPR64 in<br>

> AArch64RegisterInfo.td? And then we should define a new operand type derived<br>

> from RegisterOperand to describe 64-bit vector register operand in patterns<br>

> instead?<br>

<br>

</div>Pretty much. A new RegisterOperand for both VPR128 and VPR64, aliasing<br>

them to FPR128 and FPR64 respectively.<br>

<br>

I've attached a (completely broken, but compiles) patch showing what I<br>

mean. AsmParser fought back so I couldn't get it anywhere near working<br>

this morning. I'll see what I can do to improve it until it actually<br>

works.<br>

<br>

Cheers.<br>

<span class="HOEnZb"><font color="#888888"><br>

Tim.<br>

</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr">Best Regards,<div><br></div><div>Kevin Qin</div></div>

</div>