[PATCH] Estimate DCE effect in heuristic for estimating complete-unroll optimization effects.

Michael Zolotukhin mzolotukhin at apple.com
Thu Apr 9 16:09:44 PDT 2015


Handling dead CFG paths only looks like a simple problem, but in fact, it's much trickier.

Let me start with an example:

  for(i = 0; i < 1000; i++) {
     a[i] = b[i] + c[i];
     if(d[i]) {
        // very expensive code - let's say 998 instructions.
     }
  }

Cost of the loop body here would be 1+1+998=1000, and the estimated cost of the original loop TripCount*BodyCost = 10^6.
Suppose that d[i] is filled with 0, so `if(d[i])` is always false and we never take the expensive path.
That means, that after complete unrolling we'll end up with 1000 instructions: a[0] = b[0] + c[0], a[1] = b[1] + c[1], ...
That looks like a huge win - the cost of unrolled loop is 10^3, while the cost of original loop is 10^6. But what we actually did? We significantly increased the code size, and gained nothing in terms of performance - that expensive code would never be executed in the original loop either! The things would get even worse if, e.g. d[] contains non-zeros too.

So, we can't simply fold the branch and take only one successor - it would be incorrect to compare cost of the loop computed this way with the original cost. To be precise, that works well for code size estimate, but not for execution time (~performance). And the goal of the optimization is to improve performance - i.e. if in completely unrolled loop we'd need to execute 20% less instructions in real time, than it's worth unrolling.

Having said that, it might be interesting to take branch-folding into account, but that will need much more complicated cost model (and thus will increase code complexity). Currently I incline toward putting it off until we get a real use-case where it can help. What do you think?


http://reviews.llvm.org/D8817

EMAIL PREFERENCES
  http://reviews.llvm.org/settings/panel/emailpreferences/






More information about the llvm-commits mailing list