[llvm-commits] [PATCH][FastMath, InstCombine] Fadd/Fsub optimizations

Wed Dec 12 15:46:43 PST 2012

On Wed, Dec 12, 2012 at 2:20 PM, Shuxin Yang <shuxin.llvm at gmail.com> wrote:
> Hi, Eli:
>
>   Per our face-2-face talk yesterday, I measure some data today. Here is the
> result.
> The input *.bc is obtained from
> spec2kfp/benchspec/CFP2000/188.ammp/src/rectmm.c
> I compile it with "clang -O0 -emit-llvm rectmm.c -c -o a.bc".
>
>   Two <opt>s are compared side-by-side:
>  opt1:  pre-created some Addend instance, corresponding to is.patch,
> debug-built (as an indicator)
>  opt2:  No pre-created Addend instances. correponding to was.patch,
> debug-build (as an indicator)
>
> The difference of complexity in terms of LOC (not including testing-cases).
>    is.patch = 770 vs was.patch 799
>
>  Instcombine compile time, see bellow.
>
>  Apparently, op1 is slightly faster albeit at 29 lines of addition
> complexity.

It's only a few lines, but each new class you introduce into the code
means another type, and this way makes it more obvious that there
isn't any state across calls.

> Pleas advice what I should do next.  I personally think it is very difficult
> to significant reduce the is.patch.
> If you believe the *.patch has lots of room to improve  in terms of
> "complexity", could you please pin
> point the code such that I can focus on it.
>
>  IMHO, I  believe pattern match would only things even worse.

(Please read this part carefully; sorry it took me longer than I
should have to figure this out.)  I think it's worth comparing this
patch to what we do for integer operations.  For integer operations,
we handle the transformations you cited with a combination of
InstCombiner::SimplifyAssociativeOrCommutative,
InstCombiner::SimplifyUsingDistributiveLaws, and a few simple
peepholes which I'm pretty sure are already implemented for fadd.
Would it be possible to extend those methods instead of implementing
new infrastructure for floating-point?

Some more detailed comments below, if you decide to continue in the
direction of this patch.

+  /// This functor works with std::sort to permute addends such that those
+  /// having same symbolic-value are clustered together.
+  struct FAddendCmp {
+    bool operator()(const FAddend *A1, const FAddend *A2) {
+      return A1->getSymVal() < A2->getSymVal();
+    }
+  };

Doing relational comparisons on pointers is generally a bad idea; in
this instance, it looks like this leads to the code generating the
simplified addition in random order.

More generally, it seems like you should have some sort of unified
approach for the order in which the operations are generated, to
encourage further simplifications.  See
InstCombiner::SimplifyAssociativeOrCommutative, getComplexity in
InstCombine.h, etc.

+  if (isInt && That.isInt) {
+    int Res = IntVal * (int)That.IntVal;
+    assert(!InsaneIntVal(Res) && "Insane int value");

What prevents you from hitting this assertion?  Is this just because
it treats x * 2.0 differently from x+x?  FAddendCoef doesn't really
seem to reflect that properly.

+    // The constructor has to initialize a APFloat, which is uncessary for
+    // most addends which have coefficient either 1 or -1. So, the constructor
+    // is expensive. In order to avoid the cost of the constructor, we should
+    // reuse some instances whenever possible. The pre-created instances
+    // FAddCombine::Add[0-5] embodies this idea.

Outdated comment?

+    if ((C0 = dyn_cast<ConstantFP>(Opnd0)) && C0->isZero())
+      Opnd0 = 0;

isZero() only matches 0.0, which is probably not what you want here.

+  if (I->getOpcode() == Instruction::FMul) {
+    Value *V0 = I->getOperand(0);
+    Value *V1 = I->getOperand(1);
+    if (ConstantFP *C = dyn_cast<ConstantFP>(V0)) {
+      Addend0.set(C, V1);
+      return 1;
+    }
+
+    if (ConstantFP *C = dyn_cast<ConstantFP>(V1)) {
+      Addend0.set(C, V0);
+      return 1;
+    }
+  }

ConstantFP values should be canonicalized to the RHS of fmul, so you
don't need to check for one on the LHS.

+  if (I->getOpcode() != Instruction::FAdd &&
+      I->getOpcode() != Instruction::FSub)
+    return 0;

Assert?

Having two versions of FAddend::drillDownOneStep is kind of confusing;
do we actually need them both?  (Avoiding creating a single FAddend
seems like a micro-optimization which isn't worthwhile.)  If we do
need both, can you rename one?

There are a bunch of members on FAddCombine which it seems like you
could just move into FAddCombine::simplify; I suppose that's just
because you were experimenting?

-Eli