[LLVMdev] [RFC] Heuristic for complete loop unrolling

Fri Jan 23 13:27:22 PST 2015

Hi Michael,

This looks very much like the inner loops from convolve() in Geekbench... ;)

Interestingly if we reroll that loop nest, the number of instructions
inside goes down by a factor of 3 so we do decide to inline them all. But
that is just luck / a hack. I completely agree that our current unrolling
heuristics don't take the right things into account (our inlining
heuristics are similar!).

When thinking about this previously, I've been worried that there's a
slippery slope where you could end up implementing half of the rest of the
compiler's analysis passes in the unroller/inliner to check for
opportunities. Do you have any thoughts on that? How do we keep it under
control?

cc'ing Kevin who has been doing a lot of work recently on the loop unroller.

Cheers,

James

On Fri Jan 23 2015 at 8:52:30 PM Michael Zolotukhin <mzolotukhin at apple.com>
wrote:

> Hi devs,
>
> Recently I came across an interesting testcase that LLVM failed to
> optimize well. The test does some image processing, and as a part of it, it
> traverses all the pixels and computes some value basing on the adjacent
> pixels. So, the hot part looks like this:
>
> for(y = 0..height) {
>  for (x = 0..width) {
>    val = 0
>    for (j = 0..5) {
>      for (i = 0..5) {
>        val += img[x+i,y+j] * weight[i,j]
>      }
>    }
>  }
> }
>
> And ‘weight' is just a constant matrix with some coefficients.
>
> If we unroll the two internal loops (with tripcount 5), then we can
> replace weight[i,j] with concrete constant values. In this particular case,
> many of the coefficients are actually 0 or 1, which enables huge code
> simplifications later on. But currently we unroll only the innermost one,
> because unrolling both of them will exceed the threshold.
>
> When deciding whether to unroll or not, we currently look only at the
> instruction count of the loop. My proposal is to, on top of that, check if
> we can enable any later optimizations by unrolling - in this case by
> replacing a load with a constant. Similar to what we do in inlining
> heuristics, we can estimate how many instructions would be potentially
> eliminated after unrolling and adjust our threshold with this value.
>
> I can imagine that it might be also useful for computations, involving
> sparse constant matrixes (like identity matrix).
>
> The attached patch implements this feature, and with it we handle the
> original testcase well.
>
>
>
> Does it look good? Of course, any ideas, suggestions and other feedback
> are welcome!
>
>
> Thanks,
> Michael_______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150123/0148ece2/attachment.html>