[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations

Tue Dec 11 14:10:52 PST 2012

On Tue, Dec 11, 2012 at 1:32 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> Hi, Dear All:
>
>   The attached patch is to implement following rules about floating-point
> add/sub in relaxed mode.
> (The n-th rule is not yet implemented. I just realized it when I write this
> mail.
>  It is easy to implement this rule, but I don't like to go through stress
> test one more time).
>
> ----------------------------------------------------
> 1. (x + c1) + c2 ->  x + (c1 + c2)
> 2. (c * x) + x -> (c+1) * x
> 3. (x + x) + x -> x * 3
> 4. c * x + (x + x) -> (c + 2)*x
> 5. (x + x) + (x+x) -> 4*x
> 6. x - (x + y) -> 0 - y
>   ...
>   ...
>   ...
> n. (factoring) C * X1 + C * X2 -> C(X1 + X2)
> -------------------------------------------------------
>
>   Up to three neighboring instructions are involved in the optimization. The
> number
> of the combination is daunting!. So I have to resort a general way (instead
> of
> pattern match) to tackle these optimizations.
>
>   The idea is simple, just try to decompose instructions into uniformally
> represented
> Addends. Take following instruction sequence as an example:
>
>   t1 = 1.8 * x;
>   t2 = y - x;
>   t3 = t1 - t2;
>
>  t3 has two addends A1=<1, t1> (denote value 1*t1), and A2=<-1, t2>. If we
> "zoom-in"
> A1 and A2 one step, we will reveal more addends: A1 can be zoom-in-ed into
> another
> addend A1_0 = <1.8, x>, and A2 can be zoom-in into <1,y> and <-1,x>.
>
>  When these addends available, the optimize try to optimize following N-ary
> additions
>  using symbolic evaluation:
>    A1_0 + A2_0 + A2_1, or
>    A1 +  A2_0 + A2_1 or
>    A1_0 + A2
>
>  This patch is stress-tested with SingleSrc and MultiSource by considering
> all fadd/fsub
> are in relaxed mode.
>
>  Thank you for code review!

We already have existing code for doing this sort of transform for
integer types.  Would it be possible to reuse any of that code (in
either instcombine or reassociate)?

Why do you need the FastMathInstComb class?  It doesn't store any
useful data itself, and there isn't any point to caching the
FAddCombine instance.

Why do we need the FAddendCoef class, as opposed to just using an
APFloat?  It seems like a lot of the complexity here comes from
premature optimization of constant coefficient handling.

-Eli