<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">Hi Olivier,<div class=""><br class=""></div><div class="">Not sure if I was clear enough this morning while replying using my phone so here is it again:<div class=""><br class=""></div><div class="">I’d rather see the duplicated code (the one made obsolete by a correct canonicalization) removed from your patch (i.e. do not build technical debt), and a separate commits that implement the canonicalization part. </div><div class="">I can help for the second commit if you’d like.</div><div class=""><br class=""></div><div class="">Hope it makes sense.</div><div class=""><br class=""></div><div class="">Thanks,</div><div class=""><br class=""></div><div class="">Mehdi</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""><div class=""><br class=""></div><div class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Apr 1, 2015, at 11:23 AM, Olivier Sallenave <<a href="mailto:ol.sall@gmail.com" class="">ol.sall@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class="">If the canonicalization should be fixed elsewhere, I guess we agree that this patch could be applied?<div class=""><br class=""></div><div class="">Thanks for your help,</div><div class=""><br class=""></div><div class="">Olivier</div></div><div class="gmail_extra"><br class=""><div class="gmail_quote">2015-03-26 12:36 GMT-04:00 Mehdi Amini <span dir="ltr" class=""><<a href="mailto:mehdi.amini@apple.com" target="_blank" class="">mehdi.amini@apple.com</a>></span>:<br class=""><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word" class=""><div class=""><div class="h5"><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Mar 25, 2015, at 9:41 PM, Owen Anderson <<a href="mailto:resistor@mac.com" target="_blank" class="">resistor@mac.com</a>> wrote:</div><br class=""><div class=""><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><blockquote type="cite" class=""><div class=""><br class="">On Mar 24, 2015, at 1:39 PM, Mehdi Amini <<a href="mailto:mehdi.amini@apple.com" target="_blank" class="">mehdi.amini@apple.com</a>> wrote:</div><br class=""><div class=""><div style="word-wrap:break-word" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Mar 23, 2015, at 9:31 PM, Owen Anderson <<a href="mailto:resistor@mac.com" target="_blank" class="">resistor@mac.com</a>> wrote:</div><br class=""><div class=""><div style="word-wrap:break-word" class=""><br class=""><div class=""><blockquote type="cite" class=""><div class="">On Mar 23, 2015, at 1:48 PM, Mehdi AMINI <<a href="mailto:mehdi.amini@apple.com" target="_blank" class="">mehdi.amini@apple.com</a>> wrote:</div><br class=""><div class=""><blockquote type="cite" style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">In principle you're right, that might not be *always* beneficial. But in general, it should be, because even when "high precision" operations are twice more expensive than "low precision" one, the transformation does not worsen things. Right now this is only enabled for PPC, for which low and high precision operations have the same cost. Tell me if this is not acceptable.<br class=""></blockquote><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">Well you can imagine having more than twice the throughput in f16 than f32 on some targets, and you can also imagine that 2 x f16 operations consume less power than one f32.</span><br style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class=""><span style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;float:none;display:inline!important" class="">I'd rather have Owen's opinion on this.</span></div></blockquote></div><br class=""><div class="">It’s pretty standard for GPUs to have higher throughput on narrower datatypes.  For instance, if double precision is half the throughput of single precision, then the proposed optimization turns a three cycle sequence into a four cycle sequence.</div></div></div></blockquote><div class=""><br class=""></div><div class="">Not exactly, I believe the proposed optimization turns two “low" and a “high” into two “high”.</div><div class="">Note that it seems to me that this optimization can apply if the two low are f16 and the high is a double precision. In pseudo IR code:</div><div class=""><br class=""></div><div class="">%mul = fmul half %u, %v</div><div class="">%fma = fma half %x, %y, %mul</div><div class="">%fmaext = fpextend half %fma to double</div><div class="">%fadd = fadd double %fmaext, %z</div><div class=""><br class=""></div><div class="">becomes:</div><div class=""><br class=""></div><div class=""><div class="">%xext = fpextend half %x to double</div><div class=""><div class="">%yext = fpextend half %y to double</div></div><div class=""><div class="">%uext = fpextend half %u to double</div></div><div class=""><div class="">%vext = fpextend half %v to double</div></div><div class=""><br class=""></div></div><div class=""><div class="">%fma1 = fma double %uext, %vext, %zext</div><div class="">%fma = fma double %xext, %yext, %fma1</div><div class=""><br class=""></div></div><div class="">(assuming that both half and double are legal on the target)</div></div></div></div></blockquote><br class=""></div><div style="font-family:Helvetica;font-size:12px;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:start;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px" class="">In that case, this looks more reasonable.  The profitabilty would depend on the ratio of the processor in question, but 2x seems like a pretty common design point.</div></div></blockquote></div><br class=""></div></div><div class="">NVidia GT200 has a 1:8 fp64:fp32 ratio, and the brand new GM200 has a 1:32 ratio :)</div><div class="">But I agree that it is probably not the common case and I’m OK with the added comment so that if anyone has a need to fix it, it should be spottable.</div><div class=""><br class=""></div><div class="">What remains in this revision is the canonicalization that should be done in a specific combine and not here I think.</div><div class=""><br class=""></div><div class="">— </div><span class="HOEnZb"><font color="#888888" class=""><div class="">Mehdi</div><div class=""><br class=""></div></font></span></div><br class="">_______________________________________________<br class="">

llvm-commits mailing list<br class="">

<a href="mailto:llvm-commits@cs.uiuc.edu" class="">llvm-commits@cs.uiuc.edu</a><br class="">

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br class="">

<br class=""></blockquote></div><br class=""></div>

</div></blockquote></div><br class=""></div></div></div></body></html>