[PATCH] D39340: Modifying reassociate for improved CSE

Thu Nov 2 04:10:50 PDT 2017

fhahn added a comment.

In https://reviews.llvm.org/D39340#910844, @escha wrote:

> Hmm, reading through N-ary reassociate, it looks like it has similar goals but is not completely overlapping in action, for better or worse.
>
> N-ary reassociate runs in dominator order and looks to see if any subexpressions of the current computation have already been computed.
>
> But as far as I understand, it does not consider *reassociating expressions* to find those redundancies. In other words, it doesn't consider the N^2 possible sub-combinations of previous expressions; it only considers them in the order they were emitted. Its main goal seems to be optimizing addressing expressions. So for example, if you have (a+b) and a later expression computes (a+(b+2)), it can find that this is redundant. But if you have (a+(b+1)) and (a+(b+2)) but no (a+b), I don't think it handles that. It also only runs in dominator order, so it can't find arbitrary redundancies.
>
> Similarly, my change doesn't handle GetElementPtr in the way N-ary reassociate does, since it wasn't targeted at addressing expressions.
>
> So I agree they're redundant, and kind of share similar goals and purposes, but annoyingly I don't see any way to unify them and achieve the goals of both.

Thanks. I suppose it should be possible to to extend Reassociate to support GetElementPtr, and then we would get wha NaryReassociate provides with your patch? I am not saying this should be part of this patch, I am just trying to understand how we can build on this patch.

Your patch uses function range counts for pairs, right? Could this lead to cases were this change reassociates operations based on this global count, even though CSE won't eliminate the reassociated expression (because it would be too costly because of control flow for example) and produces worse results? I assume this is not an issue for shader code, but could happen for control-flow heavy general code. I can run some benchmarks on AArch64 and share results in a couple of days.

================
Comment at: include/llvm/Transforms/Scalar/Reassociate.h:76
+
+  unsigned Pass;
+  static const unsigned NumBinaryOps =
----------------
escha wrote:
> escha wrote:
> > fhahn wrote:
> > > arsenm wrote:
> > > > I find the use of this somewhat confusing. It seems to really be a loop counter stuck in the class to pass it to the other functions. Is it particularly inconvenient to pass this in to the functions where it's needed? Also a better name might be ReassociateStep or Level (possibly with an enum for the steps)? Pass is pretty overloaded (plus lldb doesn't understand what to do with this since there is also a class named Pass)
> > > I am probably missing something, but would it be possible to collect the linear chains as a pre-processing step and then do the full reassociate only once? You might have to move out the code to linearize binary ops, but I think it would be a more lightweight and clearer solution.
> > Mmm, I'm not sure. The problem is that the chains can *change* as a result of reassociation, which can rewrite instructions. e.g. changing use-counts can change the chains.
> > 
> > You're probably right; the main reason I didn't do that was that I wanted to make minimally invasive changes because I really don't feel I understand the pass deeply enough to restructure it like that safely.
> And to the name, absolutely, will change on next iteration regardless.
Ok thanks! I see why it is simpler to just split it up in 2 iterations. I was thinking about was having a loop that just builds the the rankMap and then have the main loop update this maps as it deletes instructions. Although it might be worth considering doing that as follow-up work.

================
Comment at: lib/Transforms/Scalar/Reassociate.cpp:2212
+        if (Ops[i].Op != Ops[j].Op) {
+          auto it2 = PairMap[Idx].find({Ops[j].Op, Ops[i].Op});
+          if (it2 != PairMap[Idx].end())
----------------
Could we order the keys in the pairs to avoid a second lookup here, i.e. insert pairs with <a, b> where a < b?

================
Comment at: lib/Transforms/Scalar/Reassociate.cpp:2251
   // Traverse the same blocks that was analysed by BuildRankMap.
-  for (BasicBlock *BI : RPOT) {
-    assert(RankMap.count(&*BI) && "BB should be ranked.");
-    // Optimize every instruction in the basic block.
-    for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;)
-      if (isInstructionTriviallyDead(&*II)) {
-        EraseInst(&*II++);
-      } else {
-        OptimizeInst(&*II);
-        assert(II->getParent() == &*BI && "Moved to a different block!");
-        ++II;
+  for (int q = 0; q < 2; ++q) {
+    for (Pass = 0; Pass < 2; ++Pass) {
----------------
It looks like we are running the reassociate 4 times? Is this needed? From the description it sounds like 2 times should be enough?

================
Comment at: lib/Transforms/Scalar/Reassociate.cpp:2293

-    // Make a copy of all the instructions to be redone so we can remove dead
-    // instructions.
-    SetVector<AssertingVH<Instruction>> ToRedo(RedoInsts);
-    // Iterate over all instructions to be reevaluated and remove trivially dead
-    // instructions. If any operand of the trivially dead instruction becomes
-    // dead mark it for deletion as well. Continue this process until all
-    // trivially dead instructions have been removed.
-    while (!ToRedo.empty()) {
-      Instruction *I = ToRedo.pop_back_val();
-      if (isInstructionTriviallyDead(I)) {
-        RecursivelyEraseDeadInsts(I, ToRedo);
-        MadeChange = true;
+      for (BasicBlock *BI : RPOT) {
+        for (Instruction &I : *BI) {
----------------
It's probably easier to understand if this would be done in a separate function.

Repository:
  rL LLVM

https://reviews.llvm.org/D39340