[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations

Tue Dec 11 14:06:02 PST 2012

Hi, Steve:

   Thanks a lot for the clarification and your expertise!
   For the sake of simplicity, I'd like to implement these rules 
separately for safe-mode FP arithmetic.

Best Regards
Shuxin

On 12/11/12 2:00 PM, Stephen Canon wrote:
> (x + x) + (x + x) --> 4 * x is always exact and is safe (this follows 
> from simply applying the (x + x) --> 2*x rule twice, then using the 
> fact that 2*2*x = 4*x).
>
> ((x + x) + x) + x --> 4 * x is exact *assuming that the default 
> rounding mode* *is in effect*.  I don't believe that we model 
> FENV_ACCESS at present, but fast-math should certainly imply 
> assume-default-rounding.  (((x + x) + x) + x) + x --> 5*x is also 
> exact assuming default rounding.  This property breaks down for adding 
> x to itself six times (there's no deep theorem here, it just works out 
> that way, sorry).
>
> On Dec 11, 2012, at 4:54 PM, Shuxin Yang <shuxin.llvm at gmail.com 
> <mailto:shuxin.llvm at gmail.com>> wrote:
>
>> Hi, Steve:
>>
>>  Thank you for your feedback.Is it also true for (x+x) + (x+x) => 4.0 
>> * x?
>>
>>   Forget to mention one thing. My coworkers told me that the CodeGen 
>> is smart enough to
>> expand C*X into right instruction sequence considering the cost of 
>> fmul and fadd on the underlying architectures.
>>
>>  The X+....+X = N*X is just to make the representation easier for the 
>> optimizer.
>>
>> Thanks
>> Shuxin
>>
>> On 12/11/12 1:39 PM, Stephen Canon wrote:
>>> (x + x) + x --> x*3 is always exact and does not require relaxed / 
>>> fast-math.
>>>
>>> - Steve
>>>
>>> On Dec 11, 2012, at 4:32 PM, Shuxin Yang <shuxin.llvm at gmail.com 
>>> <mailto:shuxin.llvm at gmail.com>> wrote:
>>>
>>>> Hi, Dear All:
>>>>
>>>>  The attached patch is to implement following rules about 
>>>> floating-point add/sub in relaxed mode.
>>>> (The n-th rule is not yet implemented. I just realized it when I 
>>>> write this mail.
>>>> It is easy to implement this rule, but I don't like to go through 
>>>> stress test one more time).
>>>>
>>>> ----------------------------------------------------
>>>> 1. (x + c1) + c2 ->  x + (c1 + c2)
>>>> 2. (c * x) + x -> (c+1) * x
>>>> 3. (x + x) + x -> x * 3
>>>> 4. c * x + (x + x) -> (c + 2)*x
>>>> 5. (x + x) + (x+x) -> 4*x
>>>> 6. x - (x + y) -> 0 - y
>>>>  ...
>>>>  ...
>>>>  ...
>>>> n. (factoring) C * X1 + C * X2 -> C(X1 + X2)
>>>> -------------------------------------------------------
>>>>
>>>>  Up to three neighboring instructions are involved in the 
>>>> optimization. The number
>>>> of the combination is daunting!. So I have to resort a general way 
>>>> (instead of
>>>> pattern match) to tackle these optimizations.
>>>>
>>>>  The idea is simple, just try to decompose instructions into 
>>>> uniformally represented
>>>> Addends. Take following instruction sequence as an example:
>>>>
>>>>  t1 = 1.8 * x;
>>>>  t2 = y - x;
>>>>  t3 = t1 - t2;
>>>>
>>>> t3 has two addends A1=<1, t1> (denote value 1*t1), and A2=<-1, t2>. 
>>>> If we "zoom-in"
>>>> A1 and A2 one step, we will reveal more addends: A1 can be 
>>>> zoom-in-ed into another
>>>> addend A1_0 = <1.8, x>, and A2 can be zoom-in into <1,y> and <-1,x>.
>>>>
>>>> When these addends available, the optimize try to optimize 
>>>> following N-ary additions
>>>> using symbolic evaluation:
>>>>   A1_0 + A2_0 + A2_1, or
>>>>   A1 +  A2_0 + A2_1 or
>>>>   A1_0 + A2
>>>>
>>>> This patch is stress-tested with SingleSrc and MultiSource by 
>>>> considering all fadd/fsub
>>>> are in relaxed mode.
>>>>
>>>> Thank you for code review!
>>>>
>>>> Shuxin
>>>>
>>>> <fast_math.add_sub.v1.patch>_______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at cs.uiuc.edu <mailto:llvm-commits at cs.uiuc.edu>
>>>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20121211/64d914bf/attachment.html>