[PATCH] D11758: [Unroll] Implement a conservative and monotonically increasing cost tracking system during the full unroll heuristic analysis that avoids counting any instruction cost until that instruction becomes "live" through a side-effect or use outside the...

Mon Oct 12 13:22:46 PDT 2015

mzolotukhin added a comment.

Hi Chander,

We briefly discussed it on IRC, but I'll duplicate my latest findings here too.

I tested this patch and found several issues, which lead to undesired unrolling in some cases, and thus, have significant compile-time impact for no performance benefit. With them fixed/worked-around, the compile time regressions seemed not that bad, but I'll need to remeasure it when we fix the issues in a proper way. With these issues worked-around I still saw some nice performance gains.

Here is the list of problems that I found:

1. With this improved algorithm for finding dead instructions, we're now able to figure out that loop control flow becomes dead after unrolling. If a loop has a small body, then the control flow might be up to 50% of the loop of the body, but it doesn't seem reasonable to unroll such loops. For instance:

  for (i = 0; i < 500; i++)
     a[i] += 1;

In this case unrolling removes nothing except the control flow, but it fools the current heuristic so the loop is unrolled. Such loops are pretty popular, so the compile time hit is severe if we unroll them. Performance gain is questionable, and probably we actually regress the performance in such cases.

2. Currently `simplifyInstWithSCEV` returns `true` (meaning the corresponding instruction is simplified) for expressions in a form `Address + ConstantOffset`. However, unrolling doesn't necessarily leads to simplification of such instruction, so our estimate might be wrong here. For example:

  for (int i=0; i < 16; i++)  {
     a[i][0]=b[S.y][S.x+i];
  }

In this case index expressions take the most part of the loop body, but unrolling doesn't help to simplify them in contrast to our estimate.

3. After we unrolled a loop we should make sure that we cleaned-up everything we expected to be simplified/dead, otherwise we will count it again when we analyze the parent loop.

Michael

http://reviews.llvm.org/D11758