[LLVMdev] [RFC] Heuristic for complete loop unrolling

Sat Jan 24 05:02:35 PST 2015

On Fri, Jan 23, 2015 at 8:05 PM, Michael Zolotukhin <mzolotukhin at apple.com>
wrote:

> Hi devs,
>
> Recently I came across an interesting testcase that LLVM failed to
> optimize well. The test does some image processing, and as a part of it, it
> traverses all the pixels and computes some value basing on the adjacent
> pixels. So, the hot part looks like this:
>
> for(y = 0..height) {
>  for (x = 0..width) {
>    val = 0
>    for (j = 0..5) {
>      for (i = 0..5) {
>        val += img[x+i,y+j] * weight[i,j]
>      }
>    }
>  }
> }
>
> And ‘weight' is just a constant matrix with some coefficients.
>
> If we unroll the two internal loops (with tripcount 5), then we can
> replace weight[i,j] with concrete constant values. In this particular case,
> many of the coefficients are actually 0 or 1, which enables huge code
> simplifications later on. But currently we unroll only the innermost one,
> because unrolling both of them will exceed the threshold.
>
> When deciding whether to unroll or not, we currently look only at the
> instruction count of the loop. My proposal is to, on top of that, check if
> we can enable any later optimizations by unrolling - in this case by
> replacing a load with a constant. Similar to what we do in inlining
> heuristics, we can estimate how many instructions would be potentially
> eliminated after unrolling and adjust our threshold with this value.
>
> I can imagine that it might be also useful for computations, involving
> sparse constant matrixes (like identity matrix).
>
> The attached patch implements this feature, and with it we handle the
> original testcase well.
>
>
>
>
> Does it look good? Of course, any ideas, suggestions and other feedback
> are welcome!
>

This is just a naive convolution algorithm implementation. Either the
author doesn't care about performance, or the author is expecting the
compiler to recognize the convolution and perform high-level loop
optimizations.

I'm not a domain expert on loop optimization, but convolution is an
operation of such *enormous* importance (FIR filtering, image processing,
PDE solving, etc.) that I would expect this to be well-researched already.
I wouldn't expect a C compiler to have this optimization, but I would
expect a fortran compiler to have it. A quick search of the literature
shows results well into the last millennium about optimizing this
particular kind of loop (which fortran calls a "stencil").

I actually think the general problem that this thread has been talking
about (how to determine if a particular optimization is going to enable
further optimizations) is very interesting in general, but I would take
this particular example with a grain of salt because is a very specific
stylized computation.

-- Sean Silva

>
>
> Thanks,
> Michael
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150124/542a2538/attachment.html>