[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations

Wed Dec 12 16:15:36 PST 2012

On Dec 12, 2012, at 3:46 PM, Eli Friedman <eli.friedman at gmail.com> wrote:

> On Wed, Dec 12, 2012 at 2:20 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
>> Hi, Eli:
>> 
>>  Per our face-2-face talk yesterday, I measure some data today. Here is the
>> result.
>> The input *.bc is obtained from
>> spec2kfp/benchspec/CFP2000/188.ammp/src/rectmm.c
>> I compile it with "clang -O0 -emit-llvm rectmm.c -c -o a.bc".
>> 
>>  Two <opt>s are compared side-by-side:
>> opt1:  pre-created some Addend instance, corresponding to is.patch,
>> debug-built (as an indicator)
>> opt2:  No pre-created Addend instances. correponding to was.patch,
>> debug-build (as an indicator)
>> 
>> The difference of complexity in terms of LOC (not including testing-cases).
>>   is.patch = 770 vs was.patch 799
>> 
>> Instcombine compile time, see bellow.
>> 
>> Apparently, op1 is slightly faster albeit at 29 lines of addition
>> complexity.
> 
> It's only a few lines, but each new class you introduce into the code
> means another type, and this way makes it more obvious that there
> isn't any state across calls.
> 
>> Pleas advice what I should do next.  I personally think it is very difficult
>> to significant reduce the is.patch.
>> If you believe the *.patch has lots of room to improve  in terms of
>> "complexity", could you please pin
>> point the code such that I can focus on it.
>> 
>> IMHO, I  believe pattern match would only things even worse.
> 
> (Please read this part carefully; sorry it took me longer than I
> should have to figure this out.)  I think it's worth comparing this
> patch to what we do for integer operations.  For integer operations,
> we handle the transformations you cited with a combination of
> InstCombiner::SimplifyAssociativeOrCommutative,
> InstCombiner::SimplifyUsingDistributiveLaws, and a few simple
> peepholes which I'm pretty sure are already implemented for fadd.
> Would it be possible to extend those methods instead of implementing
> new infrastructure for floating-point?
> 
> Some more detailed comments below, if you decide to continue in the
> direction of this patch.
> 
> +  /// This functor works with std::sort to permute addends such that those
> +  /// having same symbolic-value are clustered together.
> +  struct FAddendCmp {
> +    bool operator()(const FAddend *A1, const FAddend *A2) {
> +      return A1->getSymVal() < A2->getSymVal();
> +    }
> +  };
> 
> Doing relational comparisons on pointers is generally a bad idea; in
> this instance, it looks like this leads to the code generating the
> simplified addition in random order.
> 
> More generally, it seems like you should have some sort of unified
> approach for the order in which the operations are generated, to
> encourage further simplifications.  See
> InstCombiner::SimplifyAssociativeOrCommutative, getComplexity in
> InstCombine.h, etc.
> 
> +  if (isInt && That.isInt) {
> +    int Res = IntVal * (int)That.IntVal;
> +    assert(!InsaneIntVal(Res) && "Insane int value");
> 
> What prevents you from hitting this assertion?  Is this just because
> it treats x * 2.0 differently from x+x?  FAddendCoef doesn't really
> seem to reflect that properly.
> 
> +    // The constructor has to initialize a APFloat, which is uncessary for
> +    // most addends which have coefficient either 1 or -1. So, the constructor
> +    // is expensive. In order to avoid the cost of the constructor, we should
> +    // reuse some instances whenever possible. The pre-created instances
> +    // FAddCombine::Add[0-5] embodies this idea.
> 
> Outdated comment?
> 
> +    if ((C0 = dyn_cast<ConstantFP>(Opnd0)) && C0->isZero())
> +      Opnd0 = 0;
> 
> isZero() only matches 0.0, which is probably not what you want here.
> 

There are now m_NegZero and m_AnyZero pattern matchers that might help.

> +  if (I->getOpcode() == Instruction::FMul) {
> +    Value *V0 = I->getOperand(0);
> +    Value *V1 = I->getOperand(1);
> +    if (ConstantFP *C = dyn_cast<ConstantFP>(V0)) {
> +      Addend0.set(C, V1);
> +      return 1;
> +    }
> +
> +    if (ConstantFP *C = dyn_cast<ConstantFP>(V1)) {
> +      Addend0.set(C, V0);
> +      return 1;
> +    }
> +  }
> 
> ConstantFP values should be canonicalized to the RHS of fmul, so you
> don't need to check for one on the LHS.
> 
> +  if (I->getOpcode() != Instruction::FAdd &&
> +      I->getOpcode() != Instruction::FSub)
> +    return 0;
> 
> Assert?
> 
> 
> Having two versions of FAddend::drillDownOneStep is kind of confusing;
> do we actually need them both?  (Avoiding creating a single FAddend
> seems like a micro-optimization which isn't worthwhile.)  If we do
> need both, can you rename one?
> 
> There are a bunch of members on FAddCombine which it seems like you
> could just move into FAddCombine::simplify; I suppose that's just
> because you were experimenting?
> 
> -Eli
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits