<br><br><div class="gmail_quote">On Thu, Aug 8, 2013 at 5:23 PM, Chandler Carruth <span dir="ltr"><<a href="mailto:chandlerc@google.com" target="_blank">chandlerc@google.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div dir="ltr"><div class="gmail_extra"><div class="im"><br><div class="gmail_quote">On Thu, Aug 8, 2013 at 2:07 PM, Chad Rosier <span dir="ltr"><<a href="mailto:chad.rosier@gmail.com" target="_blank">chad.rosier@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On Thu, Aug 8, 2013 at 1:56 PM, Mark Lacey <span dir="ltr"><<a href="mailto:mark.lacey@apple.com" target="_blank">mark.lacey@apple.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><br><div><div><div>On Aug 8, 2013, at 9:56 AM, Jim Grosbach <<a href="mailto:grosbach@apple.com" target="_blank">grosbach@apple.com</a>> wrote:</div><br><blockquote type="cite">
<div style="word-wrap:break-word">Hi Chad,<div><br></div><div>This is a great transform to do, but you’re right that it’s only safe under fast-math. This is particularly interesting when the original divisor is a constant so you can materialize the reciprocal at compile-time. You’re right that in either case, this optimization should only kick in when there is more than one divide instruction that will be changed to a mul.</div>
</div></blockquote><div><br></div></div><div>It can be worthwhile to do this even in the case where there is only a single divide since 1/Y might be loop invariant, and could then be hoisted out later by LICM. You just need to be able to fold it back together when there is only a single use, and that use is not inside a more deeply nested loop.<br>
</div></div></div></blockquote></div><div><br>Ben's patch does exactly this, so perhaps that is the right approach.</div></blockquote></div><br></div>Just to be clear of what is being proposed (which I rather like):</div>
<div class="gmail_extra">
<br></div><div class="gmail_extra">1) Canonical form is to use the reciprocal when allowed (by the fast math flags, whichever we decide are appropriate).</div><div class="gmail_extra">2) The backend folds a single-use reciprocal into a direct divide.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra">Did I get it right? If so, I think this is a really nice way to capture all of the potential benefits of forming reciprocals without pessimizing code where it isn't helpful.</div>
</div>
</blockquote></div><br>I believe you're describing Ben's patch perfectly. A few transformations are pessimize, however.<br><br>From test/Transforms/InstCombine/fast-math.ll<br>1. Previously x/y + x/z was not transformed. Not it becomes x*(1/y+1/x).<br>
<br>define float @fact_div1(float %x, float %y, float %z) {<br> %t1 = fdiv fast float %x, %y<br> %t2 = fdiv fast float %x, %z<br> %t3 = fadd fast float %t1, %t2<br> ret float %t3<br>}<br><br>combines to:<br><br>define float @fact_div1(float %x, float %y, float %z) {<br>
%reciprocal = fdiv fast float 1.000000e+00, %y<br> %reciprocal1 = fdiv fast float 1.000000e+00, %z<br> %1 = fadd fast float %reciprocal, %reciprocal1<br> %2 = fmul fast float %1, %x<br> ret float %t3<br> }<br><br>I don't believe the fixup in CodeGenPrepare will undo such a transformation.<br>
<br>2. Similarly, x/y + z/x was not previously changed, but now we generate x*(1/y) + z*(1/x).<br>I believe we can undo this transformation.<br><br>3. Previously we would transform y/x + z/x => (y+z)/x. Now y/x + z/x is transformed to y*(1/x)+z*(1/x).<br>
This might be an ordering problem or perhaps we could just transform y*(1/x)+z*(1/x) => (y+z)/x. The same<br>holds true for y/x - z/x.<br><br> Chad<br><br><br>