[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations
Shuxin Yang
shuxin.llvm at gmail.com
Tue Dec 11 13:54:32 PST 2012
Hi, Steve:
Thank you for your feedback.Is it also true for (x+x) + (x+x) => 4.0 * x?
Forget to mention one thing. My coworkers told me that the CodeGen
is smart enough to
expand C*X into right instruction sequence considering the cost of fmul
and fadd on the underlying architectures.
The X+....+X = N*X is just to make the representation easier for the
optimizer.
Thanks
Shuxin
On 12/11/12 1:39 PM, Stephen Canon wrote:
> (x + x) + x --> x*3 is always exact and does not require relaxed / fast-math.
>
> - Steve
>
> On Dec 11, 2012, at 4:32 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>
>> Hi, Dear All:
>>
>> The attached patch is to implement following rules about floating-point add/sub in relaxed mode.
>> (The n-th rule is not yet implemented. I just realized it when I write this mail.
>> It is easy to implement this rule, but I don't like to go through stress test one more time).
>>
>> ----------------------------------------------------
>> 1. (x + c1) + c2 -> x + (c1 + c2)
>> 2. (c * x) + x -> (c+1) * x
>> 3. (x + x) + x -> x * 3
>> 4. c * x + (x + x) -> (c + 2)*x
>> 5. (x + x) + (x+x) -> 4*x
>> 6. x - (x + y) -> 0 - y
>> ...
>> ...
>> ...
>> n. (factoring) C * X1 + C * X2 -> C(X1 + X2)
>> -------------------------------------------------------
>>
>> Up to three neighboring instructions are involved in the optimization. The number
>> of the combination is daunting!. So I have to resort a general way (instead of
>> pattern match) to tackle these optimizations.
>>
>> The idea is simple, just try to decompose instructions into uniformally represented
>> Addends. Take following instruction sequence as an example:
>>
>> t1 = 1.8 * x;
>> t2 = y - x;
>> t3 = t1 - t2;
>>
>> t3 has two addends A1=<1, t1> (denote value 1*t1), and A2=<-1, t2>. If we "zoom-in"
>> A1 and A2 one step, we will reveal more addends: A1 can be zoom-in-ed into another
>> addend A1_0 = <1.8, x>, and A2 can be zoom-in into <1,y> and <-1,x>.
>>
>> When these addends available, the optimize try to optimize following N-ary additions
>> using symbolic evaluation:
>> A1_0 + A2_0 + A2_1, or
>> A1 + A2_0 + A2_1 or
>> A1_0 + A2
>>
>> This patch is stress-tested with SingleSrc and MultiSource by considering all fadd/fsub
>> are in relaxed mode.
>>
>> Thank you for code review!
>>
>> Shuxin
>>
>> <fast_math.add_sub.v1.patch>_______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list