[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations
Stephen Canon
scanon at apple.com
Tue Dec 11 14:00:32 PST 2012
(x + x) + (x + x) --> 4 * x is always exact and is safe (this follows from simply applying the (x + x) --> 2*x rule twice, then using the fact that 2*2*x = 4*x).
((x + x) + x) + x --> 4 * x is exact assuming that the default rounding mode is in effect. I don't believe that we model FENV_ACCESS at present, but fast-math should certainly imply assume-default-rounding. (((x + x) + x) + x) + x --> 5*x is also exact assuming default rounding. This property breaks down for adding x to itself six times (there's no deep theorem here, it just works out that way, sorry).
On Dec 11, 2012, at 4:54 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> Hi, Steve:
>
> Thank you for your feedback.Is it also true for (x+x) + (x+x) => 4.0 * x?
>
> Forget to mention one thing. My coworkers told me that the CodeGen is smart enough to
> expand C*X into right instruction sequence considering the cost of fmul and fadd on the underlying architectures.
>
> The X+....+X = N*X is just to make the representation easier for the optimizer.
>
> Thanks
> Shuxin
>
> On 12/11/12 1:39 PM, Stephen Canon wrote:
>> (x + x) + x --> x*3 is always exact and does not require relaxed / fast-math.
>>
>> - Steve
>>
>> On Dec 11, 2012, at 4:32 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>>
>>> Hi, Dear All:
>>>
>>> The attached patch is to implement following rules about floating-point add/sub in relaxed mode.
>>> (The n-th rule is not yet implemented. I just realized it when I write this mail.
>>> It is easy to implement this rule, but I don't like to go through stress test one more time).
>>>
>>> ----------------------------------------------------
>>> 1. (x + c1) + c2 -> x + (c1 + c2)
>>> 2. (c * x) + x -> (c+1) * x
>>> 3. (x + x) + x -> x * 3
>>> 4. c * x + (x + x) -> (c + 2)*x
>>> 5. (x + x) + (x+x) -> 4*x
>>> 6. x - (x + y) -> 0 - y
>>> ...
>>> ...
>>> ...
>>> n. (factoring) C * X1 + C * X2 -> C(X1 + X2)
>>> -------------------------------------------------------
>>>
>>> Up to three neighboring instructions are involved in the optimization. The number
>>> of the combination is daunting!. So I have to resort a general way (instead of
>>> pattern match) to tackle these optimizations.
>>>
>>> The idea is simple, just try to decompose instructions into uniformally represented
>>> Addends. Take following instruction sequence as an example:
>>>
>>> t1 = 1.8 * x;
>>> t2 = y - x;
>>> t3 = t1 - t2;
>>>
>>> t3 has two addends A1=<1, t1> (denote value 1*t1), and A2=<-1, t2>. If we "zoom-in"
>>> A1 and A2 one step, we will reveal more addends: A1 can be zoom-in-ed into another
>>> addend A1_0 = <1.8, x>, and A2 can be zoom-in into <1,y> and <-1,x>.
>>>
>>> When these addends available, the optimize try to optimize following N-ary additions
>>> using symbolic evaluation:
>>> A1_0 + A2_0 + A2_1, or
>>> A1 + A2_0 + A2_1 or
>>> A1_0 + A2
>>>
>>> This patch is stress-tested with SingleSrc and MultiSource by considering all fadd/fsub
>>> are in relaxed mode.
>>>
>>> Thank you for code review!
>>>
>>> Shuxin
>>>
>>> <fast_math.add_sub.v1.patch>_______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121211/ad33aac5/attachment.html>
More information about the llvm-commits
mailing list