[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations

Tue Dec 11 14:00:32 PST 2012

(x + x) + (x + x) --> 4 * x is always exact and is safe (this follows from simply applying the (x + x) --> 2*x rule twice, then using the fact that 2*2*x = 4*x).

((x + x) + x) + x --> 4 * x is exact assuming that the default rounding mode is in effect.  I don't believe that we model FENV_ACCESS at present, but fast-math should certainly imply assume-default-rounding.  (((x + x) + x) + x) + x --> 5*x is also exact assuming default rounding.  This property breaks down for adding x to itself six times (there's no deep theorem here, it just works out that way, sorry).

On Dec 11, 2012, at 4:54 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:

> Hi, Steve:
> 
>  Thank you for your feedback.Is it also true for (x+x) + (x+x) => 4.0 * x?
> 
>   Forget to mention one thing. My coworkers told me that the CodeGen is smart enough to
> expand C*X into right instruction sequence considering the cost of fmul and fadd on the underlying architectures.
> 
>  The X+....+X = N*X is just to make the representation easier for the optimizer.
> 
> Thanks
> Shuxin
> 
> On 12/11/12 1:39 PM, Stephen Canon wrote:
>> (x + x) + x --> x*3 is always exact and does not require relaxed / fast-math.
>> 
>> - Steve
>> 
>> On Dec 11, 2012, at 4:32 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>> 
>>> Hi, Dear All:
>>> 
>>>  The attached patch is to implement following rules about floating-point add/sub in relaxed mode.
>>> (The n-th rule is not yet implemented. I just realized it when I write this mail.
>>> It is easy to implement this rule, but I don't like to go through stress test one more time).
>>> 
>>> ----------------------------------------------------
>>> 1. (x + c1) + c2 ->  x + (c1 + c2)
>>> 2. (c * x) + x -> (c+1) * x
>>> 3. (x + x) + x -> x * 3
>>> 4. c * x + (x + x) -> (c + 2)*x
>>> 5. (x + x) + (x+x) -> 4*x
>>> 6. x - (x + y) -> 0 - y
>>>  ...
>>>  ...
>>>  ...
>>> n. (factoring) C * X1 + C * X2 -> C(X1 + X2)
>>> -------------------------------------------------------
>>> 
>>>  Up to three neighboring instructions are involved in the optimization. The number
>>> of the combination is daunting!. So I have to resort a general way (instead of
>>> pattern match) to tackle these optimizations.
>>> 
>>>  The idea is simple, just try to decompose instructions into uniformally represented
>>> Addends. Take following instruction sequence as an example:
>>> 
>>>  t1 = 1.8 * x;
>>>  t2 = y - x;
>>>  t3 = t1 - t2;
>>> 
>>> t3 has two addends A1=<1, t1> (denote value 1*t1), and A2=<-1, t2>. If we "zoom-in"
>>> A1 and A2 one step, we will reveal more addends: A1 can be zoom-in-ed into another
>>> addend A1_0 = <1.8, x>, and A2 can be zoom-in into <1,y> and <-1,x>.
>>> 
>>> When these addends available, the optimize try to optimize following N-ary additions
>>> using symbolic evaluation:
>>>   A1_0 + A2_0 + A2_1, or
>>>   A1 +  A2_0 + A2_1 or
>>>   A1_0 + A2
>>> 
>>> This patch is stress-tested with SingleSrc and MultiSource by considering all fadd/fsub
>>> are in relaxed mode.
>>> 
>>> Thank you for code review!
>>> 
>>> Shuxin
>>> 
>>> <fast_math.add_sub.v1.patch>_______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121211/ad33aac5/attachment.html>