[PATCH] D8817: Estimate DCE effect in heuristic for estimating complete-unroll optimization effects.

Wed Jul 15 17:20:13 PDT 2015

chandlerc added inline comments.

================
Comment at: lib/Transforms/Scalar/LoopUnrollPass.cpp:638-654
@@ +637,19 @@
+
+    for (unsigned Idx = BBWorklist.size() - 1; Idx != 0; --Idx) {
+      BasicBlock *BB = BBWorklist[Idx];
+      if (BB->empty())
+        continue;
+      for (BasicBlock::reverse_iterator I = BB->rbegin(), E = BB->rend(); I != E; ++I) {
+        if (SimplifiedValues.count(&*I))
+          continue;
+        if (DeadInstructions.count(&*I))
+          continue;
+        if (std::all_of(I->user_begin(), I->user_end(), [&](User *U) {
+              return SimplifiedValues.count(cast<Instruction>(U)) +
+                     DeadInstructions.count(cast<Instruction>(U));
+              })) {
+          UnrolledCost -= TTI.getUserCost(&*I);
+          DeadInstructions.insert(&*I);
+        }
+      }
+    }
----------------
This seems like it may be somewhat slow. And I expect this to relatively rarely impact the computation. SimplifiedValues should have forward-pruned most of the dead instructions here?

What about a slightly different approach:

- Each time we simplify something whose operands are not simplified, add that instruction to a SimplifiedRootsSet and SimplifiedRootsWorklist.

- Each time we actually count an instruction's cost, add it to a set of cost counted instructions, and increment a count of uses for each of its operands.

- Here, for each instruction in the worklist, for each operand to that instruction, if the operand is in the set of cost counted instructions and not in the SimplifiedRootSet and has a zero count of uses, subtract its cost, decrement the use counts of all its operands, add it to the SimplifiedRootSet, and add it to the worklist.

This should only start from the instructions we can't forward-simplify (things that SCEV simplifies for example), and walk recursively up its operands GC-ing everything whose use count in the loop reaches zero as a consequence.

This seems like it should be faster in the common cases than walking the user lists of every instruction? As far as I can tell, we at most walk every operand twice (once to increment, once to decrement)....

http://reviews.llvm.org/D8817