[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations

Tue Dec 11 15:26:05 PST 2012

On Tue, Dec 11, 2012 at 2:28 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>
>> We already have existing code for doing this sort of transform for
>> integer types.  Would it be possible to reuse any of that code (in
>> either instcombine or reassociate)?
>
> InstCombine only works with neighboring *FEW* instructions, it is supposed
> to be
> super fast. We don't need stuff that are over-kill of its need.

Okay.

>> Why do you need the FastMathInstComb class?  It doesn't store any
>> useful data itself, and there isn't any point to caching the
>> FAddCombine instance.
>
> See the comment following
>
>> Why do we need the FAddendCoef class, as opposed to just using an
>> APFloat?  It seems like a lot of the complexity here comes from
>> premature optimization of constant coefficient handling.
>>
>> -Eli
>
> For most addends, the coefficient is +/-1, for few they have APFloat
> coefficient.
> It would be simple to to represent all coefficients as APFloat, however, the
> cost
> is way to high!!!!!
>
> Each attempt to optimize an fadd/fsub need about 9 addends (including 6
> operands
> described from neighboring 3 instructions, and rest to evaluate intermediate
> results).
>
> Construct 9 APFloats just for the purpose is expensive.  It is expensive in
> the sense
> that more often then not, the optimization are not successful.  In this
> situation,
> I'd like the cost is almost neglible.
>
> To avoid the cost of APFloat construction, I have to pre-create some
> instances and
> amortize the cost for the entire function being compiled.
>
> The FastMathInstCom is created for the that purpose.
> Unfortunately,efficiency and
> simplicity usually come at each others' expense....

Correct me if I'm wrong, but as far as I can tell, you aren't actually
storing any useful state across calls to simplifyFAdd.

> The avoidance of APFloat construction cost is the MOST difficult part of
> this change.
> It took me about 70% time to strike the balance between the clarity the
> efficiency.

I understand where you're coming from, but I'm not convinced you're
right.  Constructing an APFloat simply isn't that expensive,
especially for floating-point types which don't have more than 64
bits.  And having a bunch of special-case code also has a performance
cost.

Anyway, if you really care about it being fast, matching
PatternMatching.h-style patterns is going to be faster than any data
structure.

-Eli