<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Ana,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">
LGTM now!</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">Thanks,</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small">
-Jiangning</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif;font-size:small"><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">2013/12/5 Ana Pazos <span dir="ltr"><<a href="mailto:apazos@codeaurora.org" target="_blank">apazos@codeaurora.org</a>></span><br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="EN-US" link="blue" vlink="purple"><div><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Hi Jiangning,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">In the example you mentioned the dup instruction is the best pattern because the test case simply returns the result of vget intrinsic as a float.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">But we cannot guarantee that vset/vget will use the preferred mapping from the ARM manual nor that the preferred mapping is the best code always, because it depends on how the half-precision FP value is used before or after the vget/vset intrinsics calls (which may cause other patterns like UMOV, INS to be applied).<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">What I decided to do: <u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">I added a couple of additional patterns that include fp16_to_f32 operation in the vector extract/insert and vector copy expressions. This way I can optimize the test case you mentioned and a couple more cases I could think of.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">See the updated patches attached.<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Thanks,<u></u><u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d">Ana.<u></u><u></u></span></p><p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1f497d"><u></u> <u></u></span></p><p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Jiangning Liu [mailto:<a href="mailto:liujiangning1@gmail.com" target="_blank">liujiangning1@gmail.com</a>] <br>
<b>Sent:</b> Wednesday, December 04, 2013 1:21 AM<br><b>To:</b> Ana Pazos<br><b>Cc:</b> llvm-commits; <a href="mailto:cfe-commits@cs.uiuc.edu" target="_blank">cfe-commits@cs.uiuc.edu</a>; Tim Northover<br><b>Subject:</b> Re: [PATCH]{AArch64] Implemented half-precision vget/vset_lane_f16 intrinsics<u></u><u></u></span></p>
<div><div class="h5"><p class="MsoNormal"><u></u> <u></u></p><div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif"">Hi Ana,<u></u><u></u></span></p></div><div><p class="MsoNormal">
<span style="font-family:"Arial","sans-serif""><u></u> <u></u></span></p></div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif"">Can we generate more optimized code by using instruction </span><span style="font-size:13.5pt">DUP Hd,Vn.H[lane]</span><span style="font-family:"Arial","sans-serif""> directly.<u></u><u></u></span></p>
</div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif""><u></u> <u></u></span></p></div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif"">With your patch we are generating two instructions like,<u></u><u></u></span></p>
</div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif""><u></u> <u></u></span></p></div><div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif""> umov w0, v0.h[3]</span><u></u><u></u></p>
</div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif""> fmov s0, w0</span><u></u><u></u></p></div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif""><u></u> <u></u></span></p>
</div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif"">Actually they can be combined to a single "dup h0, v0.h[3]". This is what we want for this intrinsic function.<u></u><u></u></span></p>
</div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif""><u></u> <u></u></span></p></div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif"">What you need to do is by adding one more pattern to match this case.<u></u><u></u></span></p>
</div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif""><u></u> <u></u></span></p></div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif"">Thanks,<u></u><u></u></span></p>
</div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif"">-Jiangning<u></u><u></u></span></p></div><div><p class="MsoNormal"><span style="font-family:"Arial","sans-serif""><u></u> <u></u></span></p>
</div></div></div><div><p class="MsoNormal" style="margin-bottom:12.0pt"><u></u> <u></u></p><div><p class="MsoNormal">2013/12/4 Ana Pazos <<a href="mailto:apazos@codeaurora.org" target="_blank">apazos@codeaurora.org</a>><u></u><u></u></p>
<div><div><p class="MsoNormal">Hi Tim and reviewers,<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">I followed Tim’s suggestion to use macros to handle the float16_t storage type.<u></u><u></u></p>
<p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Notes:<u></u><u></u></p><p>1)<span style="font-size:7.0pt"> </span>Since the vget/vset_lane_f16 intrinsics read and write 16 bit data (no FP arithmetic performed), I simply reinterpreted float16_t and the vector of float16_t as i16 data.<u></u><u></u></p>
<p>See the operators defined in NeonEmitter.<u></u><u></u></p><p>2)<span style="font-size:7.0pt"> </span>With this change vget/vset_lane_f16 use to vget/vset_lane_i16 implementation.<u></u><u></u></p><p>3)<span style="font-size:7.0pt"> </span>I added f16_to_f32 pattern because in the vset_lane case the i16 data is moved from GPR to a FP register, and hence the need for this pattern.<u></u><u></u></p>
<p>4)<span style="font-size:7.0pt"> </span>I added test cases that define float16_t variable in the function body, but do not return such value type as it is not allowed. Let me know if these tests are satisfactory.<u></u><u></u></p>
<p>5)<span style="font-size:7.0pt"> </span>I did not try to enforce the recommended intrinsic->instruction map from the ARM document. <u></u><u></u></p><p>To force those instructions I would have to use builtins and v1i16 type casts so I can create the pattern.<u></u><u></u></p>
<p>Even doing that, in some cases the UMOV, INS patterns defined earlier can prevail over.<u></u><u></u></p><p class="MsoNormal"> <u></u><u></u></p><p class="MsoNormal">Thanks,<u></u><u></u></p><p class="MsoNormal">Ana.<u></u><u></u></p>
</div></div><p class="MsoNormal" style="margin-bottom:12.0pt"><br>_______________________________________________<br>cfe-commits mailing list<br><a href="mailto:cfe-commits@cs.uiuc.edu" target="_blank">cfe-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits</a><u></u><u></u></p></div><p class="MsoNormal"><br><br clear="all"><u></u><u></u></p><div>
<p class="MsoNormal"><u></u> <u></u></p></div><p class="MsoNormal">-- <u></u><u></u></p><div><p class="MsoNormal"><span style="font-family:"Courier New"">Thanks,</span><u></u><u></u></p><div><p class="MsoNormal">
<span style="font-family:"Courier New"">-Jiangning</span><u></u><u></u></p></div></div></div></div></div></div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div dir="ltr"><font face="courier new, monospace">Thanks,</font><div>
<font face="courier new, monospace">-Jiangning</font></div></div>
</div>