[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations

Wed Dec 12 16:21:56 PST 2012

>> Pleas advice what I should do next.  I personally think it is very difficult
>> to significant reduce the is.patch.
>> If you believe the *.patch has lots of room to improve  in terms of
>> "complexity", could you please pin
>> point the code such that I can focus on it.
>>
>>   IMHO, I  believe pattern match would only things even worse.
> (Please read this part carefully; sorry it took me longer than I
> should have to figure this out.)  I think it's worth comparing this
> patch to what we do for integer operations.  For integer operations,
> we handle the transformations you cited with a combination of
> InstCombiner::SimplifyAssociativeOrCommutative,
> InstCombiner::SimplifyUsingDistributiveLaws, and a few simple
> peepholes which I'm pretty sure are already implemented for fadd.
> Would it be possible to extend those methods instead of implementing
> new infrastructure for floating-point?
I need to re-read this two functions one more time and go back to you.

After I modify the Instruction::isAssociative(), these two functions can 
pick up some
opportunities, but not all that we need.

One weak argument I have right now to differentiate int & fp 
implementation is that
FP is way more complicate. If we try to solve int & fp in a uniform way, 
we have to
slow int-side down a bit.

On the other hand,  fp opt has its unique feature. For instance, if I 
optimize
4.0 * x - 3.0 * x => 1.0 * x, we are better off demoting 1.0 into integer 1
to expose more opportunities.

Int opt doesn't need to handle such situation.

>
> Some more detailed comments below, if you decide to continue in the
> direction of this patch.
>
> +  /// This functor works with std::sort to permute addends such that those
> +  /// having same symbolic-value are clustered together.
> +  struct FAddendCmp {
> +    bool operator()(const FAddend *A1, const FAddend *A2) {
> +      return A1->getSymVal() < A2->getSymVal();
> +    }
> +  };
>
> Doing relational comparisons on pointers is generally a bad idea; in
> this instance, it looks like this leads to the code generating the
> simplified addition in random order.
True. I will figure out a way to make it more deterministic.
>
> More generally, it seems like you should have some sort of unified
> approach for the order in which the operations are generated, to
> encourage further simplifications.  See
> InstCombiner::SimplifyAssociativeOrCommutative, getComplexity in
> InstCombine.h, etc.
>
> +  if (isInt && That.isInt) {
> +    int Res = IntVal * (int)That.IntVal;
> +    assert(!InsaneIntVal(Res) && "Insane int value");
>
> What prevents you from hitting this assertion?  Is this just because
> it treats x * 2.0 differently from x+x?  FAddendCoef doesn't really
> seem to reflect that properly.
For now, I don't try to demote a FP coefficient (say 2.0) to integer value.
Now that, there are no more than 4 addends involved in the optimization.
The *INT* coefficient should fall in [-4, 4].

>
> +    // The constructor has to initialize a APFloat, which is uncessary for
> +    // most addends which have coefficient either 1 or -1. So, the constructor
> +    // is expensive. In order to avoid the cost of the constructor, we should
> +    // reuse some instances whenever possible. The pre-created instances
> +    // FAddCombine::Add[0-5] embodies this idea.
>
> Outdated comment?
Outdated for is.patch. Valid for was.patch.

> +    if ((C0 = dyn_cast<ConstantFP>(Opnd0)) && C0->isZero())
> +      Opnd0 = 0;
>
> isZero() only matches 0.0, which is probably not what you want here.
As far as I can understand ConstantFP::isZero() call APFloat()::isZero() 
which return
true if the value is +/- 0.0.
>
> +  if (I->getOpcode() == Instruction::FMul) {
> +    Value *V0 = I->getOperand(0);
> +    Value *V1 = I->getOperand(1);
> +    if (ConstantFP *C = dyn_cast<ConstantFP>(V0)) {
> +      Addend0.set(C, V1);
> +      return 1;
> +    }
> +
> +    if (ConstantFP *C = dyn_cast<ConstantFP>(V1)) {
> +      Addend0.set(C, V0);
> +      return 1;
> +    }
> +  }
>
> ConstantFP values should be canonicalized to the RHS of fmul, so you
> don't need to check for one on the LHS.
>
> +  if (I->getOpcode() != Instruction::FAdd &&
> +      I->getOpcode() != Instruction::FSub)
> +    return 0;
>
> Assert?
You are right, Assert is better to save compile time as the the 
condition is already checked before
calling this function.
>
>
> Having two versions of FAddend::drillDownOneStep is kind of confusing;
> do we actually need them both?
One is used to break a give "Value* V" into 1 or 2 addends.
The other one is to break an addend (say <3.4, V2>, NOTE: the 
coefficient is not 1) into addends.
Suppose V = X + Y,  the two addends we have are 3.4*X and 3.4Y.

> (Avoiding creating a single FAddend
> seems like a micro-optimization which isn't worthwhile.)  If we do
> need both, can you rename one?
I'm sure I understand. Could you please elaborate?
> There are a bunch of members on FAddCombine which it seems like you
> could just move into FAddCombine::simplify; I suppose that's just
> because you were experimenting?
>
They can be added to to FAddCombine::simplify(), but this function 
already has 70 LOC.
I'd like a function is reasonable small to make the logic clear, and let 
inliner to help me out
inlining helper functions whenever and wherever it fits.