<div dir="ltr">Hi Tim,<div class="gmail_extra">
</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"></blockquote><div>Thanks for your feedback. Although there are still more work to do, I have to post this patch to support SISD implementation by others.</div>
<div><br></div><div>I improved my implementation about:</div><div>* Change umov and smov combination to common pattern.</div><div>* Use <font face="arial, sans-serif">getMatchingSuperReg() instead of enum calculation.</font></div>
<div><font face="arial, sans-serif">* Change </font><span style="font-family:arial,sans-serif">neon_uimm0_chomp_hash </span><span style="font-family:arial,sans-serif">and its friends</span><span style="font-family:arial,sans-serif"> to </span><span style="font-family:arial,sans-serif">neon_uimm0_bare like.</span></div>
<div><span style="font-family:arial,sans-serif">* Slim MC codes by using template.</span></div><div><br></div><div><font face="arial, sans-serif">For your question,</font></div><div><font face="arial, sans-serif"><br></font></div>
<div><span style="font-family:arial,sans-serif">>+      if(getSubTarget().hasNEON())</span><br style="font-family:arial,sans-serif"><span style="font-family:arial,sans-serif">>+        [...]</span><br style="font-family:arial,sans-serif">
<span style="font-family:arial,sans-serif">>+      else</span><br style="font-family:arial,sans-serif">><br style="font-family:arial,sans-serif"><span style="font-family:arial,sans-serif">>What's the advantage of emitting  a different instruction if the core</span><br style="font-family:arial,sans-serif">
<span style="font-family:arial,sans-serif">>has NEON here? The normal FP one should work regardless (and may</span><br style="font-family:arial,sans-serif"><span style="font-family:arial,sans-serif">>actually be faster since the lane doesn't have to be decoded)</span><font face="arial, sans-serif"><br>
</font></div><div><span style="font-family:arial,sans-serif"><br></span></div><div><span style="font-family:arial,sans-serif">We think if someone move data bewteen GPR and FPR on the device with NEON, he most likely to use some SISD instructions  after or before movement. Considering FPU and NEON are different hard cores, mixing FPU instruction with NEON instruction may get worse performance. If you have any solid evidence retort this, please show us.</span></div>
<div><span style="font-family:arial,sans-serif"><br></span></div><div><span style="font-family:arial,sans-serif">>+def Neon_vector_extract : SDNode<"ISD::EXTRACT_VECTOR_</span><span style="font-family:arial,sans-serif">ELT", SDT_Neon_mov_lane>;</span><br style="font-family:arial,sans-serif">
<span style="font-family:arial,sans-serif">>+def Neon_vector_insert : SDNode<"ISD::INSERT_VECTOR_</span><span style="font-family:arial,sans-serif">ELT", SDTypeProfile<1, 3,</span><br style="font-family:arial,sans-serif">
<span style="font-family:arial,sans-serif">>+                           [SDTCisVec<0>, SDTCisSameAs<0, 1>,</span><br style="font-family:arial,sans-serif"><span style="font-family:arial,sans-serif">>+                           SDTCisInt<2>, SDTCisVT<3, i32>]>>;</span><br style="font-family:arial,sans-serif">
><br style="font-family:arial,sans-serif"><span style="font-family:arial,sans-serif">>What's wrong with the existing extractelt and insertelt?</span><span style="font-family:arial,sans-serif"><br></span></div><div>
<span style="font-family:arial,sans-serif"><br></span></div><div><span style="font-family:arial,sans-serif">extractelt is</span><span style="font-family:arial,sans-serif"> defined as the output holds the same value type of vector element, but smov and umov can extract the element and extend it simultaneously.</span></div>
<div><span style="font-family:arial,sans-serif">Also,I try to use</span><font face="arial, sans-serif"> vector_extract . But its last parameter is defined as a pointer, which is presented as a i64 constant in aarch64. This will cause pattern match fail because all immediate numbers are defined as i32. So I have to create new operator with right parameter value type.</font></div>
<div><font face="arial, sans-serif"><br></font></div><div><span style="font-family:arial,sans-serif">>+               let Inst{14-12} = {Immn{2}, Immn{1}, Immn{0}};</span><br style="font-family:arial,sans-serif">><br style="font-family:arial,sans-serif">
<span style="font-family:arial,sans-serif">>What's bit 11? Same for later ones. If it's set elsewhere, a comment</span><br style="font-family:arial,sans-serif"><span style="font-family:arial,sans-serif">>would be useful. If it's not then setting it is probably essential.</span><br style="font-family:arial,sans-serif">
</div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">Because bit 11 is unspecified, any value is ok.</font></div><div><font face="arial, sans-serif"><br></font></div><div><span style="font-family:arial,sans-serif">>+    def _16B16 : NeonI_insert<0b1, 0b1,</span><br style="font-family:arial,sans-serif">
<span style="font-family:arial,sans-serif">>+    [...]</span><br style="font-family:arial,sans-serif"><span style="font-family:arial,sans-serif">>+    def _16B8 : NeonI_insert<0b1, 0b1,</span><br style="font-family:arial,sans-serif">
><br style="font-family:arial,sans-serif"><span style="font-family:arial,sans-serif">>Why are these separate instructions? It seems to be some attempt to</span><br style="font-family:arial,sans-serif"><span style="font-family:arial,sans-serif">>model the VPR64/VPR128 distinction, but I don't think it actually</span><br style="font-family:arial,sans-serif">
<span style="font-family:arial,sans-serif">>gains us anything. It's not like the high part of VPR128 can be used</span><br style="font-family:arial,sans-serif"><span style="font-family:arial,sans-serif">>independently really.</span><font face="arial, sans-serif"><br>
</font></div><div><span style="font-family:arial,sans-serif"><br></span></div><div><span style="font-family:arial,sans-serif">I am working on removing all VPR64 instrcution format and custom promote all 64 bit vector to 128 bit. I will add this modification in next edition.</span></div>
</div>