[PATCH] D40049: [PATCH] Global reassociation for improved CSE

escha via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Nov 14 13:54:54 PST 2017


escha created this revision.
Herald added a subscriber: wdng.

This has been improved from the original patch to take into account suggestions, fix bugs, and reduce complexity, so it's no longer an RFC and now a patch ;-)

When playing around with reassociate I noticed a seemingly obvious optimization that was not getting done anywhere in llvm… nor in gcc or ICC.

Consider the following trivial function:

  void foo(int a, int b, int c, int d, int e, int *res) {
    res[0] = (e * a) * d;
    res[1] = (e * b) * d;
    res[2] = (e * c) * d;
  }

This function can be optimized down to 4 multiplies instead of 6 by reassociating such that (e*d) is the common subexpresion. However, no compiler I’ve tested does this. I wrote a slightly hacky heuristic algorithm to augment reassociate to do this and tested it.

First, before the details, the results: on a large offline test suite of graphics shaders it cut down total instruction count by ~0.9% (!) and float math instruction count by 1.5% (!).

Here’s how it works:

1. Do reassociate once as normal.
2. Create a “pair map” consisting of a mapping from <Instr, Instr> to <unsigned>. Have one pair map for each type of BinaryOperation. This map represents how common a given operand pair occurs in the source code for a given BinaryOperation. But in addition to counting each actual instruction, we also count each possible O(N^2) pair of each linear operand chain. So for example, if the operand chain is this:

  a*b*c

we do:

  PairMap[Multiply][{a, b}]++;
  PairMap[Multiply][{b, c}]++;
  PairMap[Multiply][{a, c}]++;

The chain length is capped at an arbitrary but low value to avoid possible quadratic behavior.

3. Run reassociate again. All the information is saved from the first time around so hopefully this won’t be very expensive except for the changes we actually make. But this time, whenever emitting a linear operand chain, pick the operand pair that’s *most common* in the source code (using PairMap) and make that one the first operation. Thus, for example:

  (((a*b)*c)*d)*e

if “b*e” is the most common, this becomes:

  (((b*e)*a)*c)*d

Now b*e can be CSE’d later! Magic!

Also, as a tiebreaker, the current one I’m using is the “pair which has the lowest max rank of the two operands”, which makes sense because in this example, “a*b” is the first operation in the chain, so we want to pick the duplicates which are also higher up in the program vs closer to the leaf. No other tiebreaker I tried seemed to work as well.

Overall this patch was structured to be a minimally invasive change to avoid major structural changes in Reassociate. It does nothing except change the order in which operands are emitted during the final step, which should be much safer and less complex than actually changing the core algorithm. The core algorithm of reassociate is actually agnostic to this; it only serves to determine *which* operands are picked, not which order they're emitted in.

This patch unfortunately has some overlap with N-ary reassociate, but because they work in such different ways they are probably not unifiable (N-ary reassociate uses a very different algorithm that doesn't catch as many cases, but is less heuristic-based and designed around addressing expressions. And because it uses SCEV, it can't work on float, which is my primary use-case).


Repository:
  rL LLVM

https://reviews.llvm.org/D40049

Files:
  include/llvm/Transforms/Scalar/Reassociate.h
  lib/Transforms/Scalar/Reassociate.cpp
  test/Transforms/Reassociate/basictest.ll
  test/Transforms/Reassociate/canonicalize-neg-const.ll
  test/Transforms/Reassociate/fast-ReassociateVector.ll
  test/Transforms/Reassociate/fast-basictest.ll
  test/Transforms/Reassociate/mulfactor.ll
  test/Transforms/Reassociate/reassoc-intermediate-fnegs.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D40049.122909.patch
Type: text/x-patch
Size: 13733 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20171114/10bba915/attachment.bin>


More information about the llvm-commits mailing list