<html><head><meta http-equiv="Content-Type" content="text/html charset=iso-8859-1"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div>(x + x) + (x + x) --> 4 * x is always exact and is safe (this follows from simply applying the (x + x) --> 2*x rule twice, then using the fact that 2*2*x = 4*x).</div><div><br></div><div>((x + x) + x) + x --> 4 * x is exact <b>assuming that the default rounding mode</b> <b>is in effect</b>. I don't believe that we model FENV_ACCESS at present, but fast-math should certainly imply assume-default-rounding. (((x + x) + x) + x) + x --> 5*x is also exact assuming default rounding. This property breaks down for adding x to itself six times (there's no deep theorem here, it just works out that way, sorry).</div><br><div><div>On Dec 11, 2012, at 4:54 PM, Shuxin Yang <<a href="mailto:shuxin.llvm@gmail.com">shuxin.llvm@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite">Hi, Steve:<br><br> Thank you for your feedback.Is it also true for (x+x) + (x+x) => 4.0 * x?<br><br> Forget to mention one thing. My coworkers told me that the CodeGen is smart enough to<br>expand C*X into right instruction sequence considering the cost of fmul and fadd on the underlying architectures.<br><br> The X+....+X = N*X is just to make the representation easier for the optimizer.<br><br> Thanks<br>Shuxin<br><br>On 12/11/12 1:39 PM, Stephen Canon wrote:<br><blockquote type="cite">(x + x) + x --> x*3 is always exact and does not require relaxed / fast-math.<br><br>- Steve<br><br>On Dec 11, 2012, at 4:32 PM, Shuxin Yang <<a href="mailto:shuxin.llvm@gmail.com">shuxin.llvm@gmail.com</a>> wrote:<br><br><blockquote type="cite">Hi, Dear All:<br><br> The attached patch is to implement following rules about floating-point add/sub in relaxed mode.<br>(The n-th rule is not yet implemented. I just realized it when I write this mail.<br>It is easy to implement this rule, but I don't like to go through stress test one more time).<br><br>----------------------------------------------------<br>1. (x + c1) + c2 -> x + (c1 + c2)<br>2. (c * x) + x -> (c+1) * x<br>3. (x + x) + x -> x * 3<br>4. c * x + (x + x) -> (c + 2)*x<br>5. (x + x) + (x+x) -> 4*x<br>6. x - (x + y) -> 0 - y<br> ...<br> ...<br> ...<br>n. (factoring) C * X1 + C * X2 -> C(X1 + X2)<br>-------------------------------------------------------<br><br> Up to three neighboring instructions are involved in the optimization. The number<br>of the combination is daunting!. So I have to resort a general way (instead of<br>pattern match) to tackle these optimizations.<br><br> The idea is simple, just try to decompose instructions into uniformally represented<br>Addends. Take following instruction sequence as an example:<br><br> t1 = 1.8 * x;<br> t2 = y - x;<br> t3 = t1 - t2;<br><br>t3 has two addends A1=<1, t1> (denote value 1*t1), and A2=<-1, t2>. If we "zoom-in"<br>A1 and A2 one step, we will reveal more addends: A1 can be zoom-in-ed into another<br>addend A1_0 = <1.8, x>, and A2 can be zoom-in into <1,y> and <-1,x>.<br><br>When these addends available, the optimize try to optimize following N-ary additions<br>using symbolic evaluation:<br> A1_0 + A2_0 + A2_1, or<br> A1 + A2_0 + A2_1 or<br> A1_0 + A2<br><br>This patch is stress-tested with SingleSrc and MultiSource by considering all fadd/fsub<br>are in relaxed mode.<br><br>Thank you for code review!<br><br>Shuxin<br><br><fast_math.add_sub.v1.patch>_______________________________________________<br>llvm-commits mailing list<br><a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits<br></blockquote></blockquote><br></blockquote></div><br></body></html>