<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 5, 2015 at 7:36 AM, <a href="mailto:hfinkel@anl.gov">hfinkel@anl.gov</a> <span dir="ltr"><<a href="mailto:hfinkel@anl.gov" target="_blank">hfinkel@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">In <a href="http://reviews.llvm.org/D7790#134758" target="_blank">http://reviews.llvm.org/D7790#134758</a>, @mkuper wrote:<br>

<br>

> Hi James,<br>

><br>

> Just so we have a record of what we talked about on IRC (and can give Hal a chance to disagree :-)<br>

<br>

<br>

</span>Good; I disagree :-)<br>

<br>

The first question is answer is: What is the most useful and reasonable canonical form? The reason I support running this pass early in the pipeline is because I believe that demoting these int -> fp -> int sequences to int sequences, when semantically equivalent, is the most useful canonical form.<br>

<br>

If it is useful, because of microarchitectural features, to use FP vector ops instead of integer vector ops, then that should be 'actively' handled later (instead of just taking advantage of it when it happens to happen).<br></blockquote><div><br></div><div>Actually, the opposite transformation might be useful in any backend and is not limited to vector ops. Currently, extremely integer-heavy workloads (there are many applications that fall into this category, e.g. LLVM itself) end up leaving all the floating point units idle regardless of architecture. So it's just a matter of the relative domain-crossing costs vs. the extra ILP due to having more execution resources. On architectures like x86 that have a memory-->register int to FP conversion instruction, some of the domain crossing cost can be avoided. A cursory look at the wikipedia page for POWER8 indicates that the core has 2x integer units, but 7 other units that can do basic arithmetic (4x FPU, 2x VMX, 1x Decimal FP).</div><div><br></div><div>-- Sean Silva</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

So I think that this should run early by default, x86 included. We should also reverse the transformation later, perhaps within the vectorizer, using an actual cost model, if that proves useful.<br>

<span class="im HOEnZb"><br>

).<br>

<br>

> On x86, vector i64 muls can be much worse than vector double muls. Since this is pre-LoopV, and we don't know if we'll end up with vector or scalar code, I think the safe thing to do on x86 would be to disable this for cases where we'll do a double -> i64 transformation.<br>

<br>

><br>

<br>

> This means we should probably have a target hook for that that x86 can override.<br>

<br>

<br>

<br>

</span><span class="im HOEnZb">REPOSITORY<br>

  rL LLVM<br>

<br>

<a href="http://reviews.llvm.org/D7790" target="_blank">http://reviews.llvm.org/D7790</a><br>

<br>

</span><div class="HOEnZb"><div class="h5">EMAIL PREFERENCES<br>

  <a href="http://reviews.llvm.org/settings/panel/emailpreferences/" target="_blank">http://reviews.llvm.org/settings/panel/emailpreferences/</a><br>

<br>

<br>

<br>

_______________________________________________<br>

llvm-commits mailing list<br>

<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>

</div></div></blockquote></div><br></div></div>