[PATCH] D39340: Modifying reassociate for improved CSE

Thu Nov 2 09:44:29 PDT 2017

escha added a comment.

In https://reviews.llvm.org/D39340#913756, @fhahn wrote:

> In https://reviews.llvm.org/D39340#910844, @escha wrote:
>
> > Hmm, reading through N-ary reassociate, it looks like it has similar goals but is not completely overlapping in action, for better or worse.
> >
> > N-ary reassociate runs in dominator order and looks to see if any subexpressions of the current computation have already been computed.
> >
> > But as far as I understand, it does not consider *reassociating expressions* to find those redundancies. In other words, it doesn't consider the N^2 possible sub-combinations of previous expressions; it only considers them in the order they were emitted. Its main goal seems to be optimizing addressing expressions. So for example, if you have (a+b) and a later expression computes (a+(b+2)), it can find that this is redundant. But if you have (a+(b+1)) and (a+(b+2)) but no (a+b), I don't think it handles that. It also only runs in dominator order, so it can't find arbitrary redundancies.
> >
> > Similarly, my change doesn't handle GetElementPtr in the way N-ary reassociate does, since it wasn't targeted at addressing expressions.
> >
> > So I agree they're redundant, and kind of share similar goals and purposes, but annoyingly I don't see any way to unify them and achieve the goals of both.
>
>
> Thanks. I suppose it should be possible to to extend Reassociate to support GetElementPtr, and then we would get wha NaryReassociate provides with your patch? I am not saying this should be part of this patch, I am just trying to understand how we can build on this patch.

I'm not sure. GetElementPtr is a fairly complex instruction that isn't easily mapped to other instructions (i.e. it can represent a whole chain instead of just one). I think the reason that Nary is able to do this is because it uses SCEV, which abstracts over that.

Who wrote Nary; maybe they might have a useful opinion on all this?

> Your patch uses function range counts for pairs, right? Could this lead to cases were this change reassociates operations based on this global count, even though CSE won't eliminate the reassociated expression (because it would be too costly because of control flow for example) and produces worse results? I assume this is not an issue for shader code, but could happen for control-flow heavy general code. I can run some benchmarks on AArch64 and share results in a couple of days.

Yeah, I was thinking of that. In the most extreme case, you could end up with disjoint instructions in different control flow paths, but on the other hand, the new GVN-Hoist should ideally catch those. You could use something like a scoped data structure to handle this in theory, maybe?

================
Comment at: lib/Transforms/Scalar/Reassociate.cpp:2212
+        if (Ops[i].Op != Ops[j].Op) {
+          auto it2 = PairMap[Idx].find({Ops[j].Op, Ops[i].Op});
+          if (it2 != PairMap[Idx].end())
----------------
fhahn wrote:
> Could we order the keys in the pairs to avoid a second lookup here, i.e. insert pairs with <a, b> where a < b?
hmm. what would be the best way to do this? It's possible for two instructions to have the same Rank, iirc. would pointer order be acceptable? I'd be worried about determinism.

================
Comment at: lib/Transforms/Scalar/Reassociate.cpp:2251
   // Traverse the same blocks that was analysed by BuildRankMap.
-  for (BasicBlock *BI : RPOT) {
-    assert(RankMap.count(&*BI) && "BB should be ranked.");
-    // Optimize every instruction in the basic block.
-    for (BasicBlock::iterator II = BI->begin(), IE = BI->end(); II != IE;)
-      if (isInstructionTriviallyDead(&*II)) {
-        EraseInst(&*II++);
-      } else {
-        OptimizeInst(&*II);
-        assert(II->getParent() == &*BI && "Moved to a different block!");
-        ++II;
+  for (int q = 0; q < 2; ++q) {
+    for (Pass = 0; Pass < 2; ++Pass) {
----------------
fhahn wrote:
> It looks like we are running the reassociate 4 times? Is this needed? From the description it sounds like 2 times should be enough?
oh, right. this was me experimenting to see if i ran it twice I'd get better results .... and I did! I'm guessing this is because if you heavily reassociate based on global statistics, it opens up new opportunities, and so on. But it might be unnecessary.

this is probably in part caused by the fact that my patch isn't worklist based (i.e. run over it repeatedly instead of add things to a worklist to revisit)

================
Comment at: lib/Transforms/Scalar/Reassociate.cpp:2293

-    // Make a copy of all the instructions to be redone so we can remove dead
-    // instructions.
-    SetVector<AssertingVH<Instruction>> ToRedo(RedoInsts);
-    // Iterate over all instructions to be reevaluated and remove trivially dead
-    // instructions. If any operand of the trivially dead instruction becomes
-    // dead mark it for deletion as well. Continue this process until all
-    // trivially dead instructions have been removed.
-    while (!ToRedo.empty()) {
-      Instruction *I = ToRedo.pop_back_val();
-      if (isInstructionTriviallyDead(I)) {
-        RecursivelyEraseDeadInsts(I, ToRedo);
-        MadeChange = true;
+      for (BasicBlock *BI : RPOT) {
+        for (Instruction &I : *BI) {
----------------
fhahn wrote:
> It's probably easier to understand if this would be done in a separate function.
good point. agreed

Repository:
  rL LLVM

https://reviews.llvm.org/D39340