[LLVMdev] [RFC] Heuristic for complete loop unrolling

Sat Jan 24 06:25:19 PST 2015

----- Original Message -----
> From: "Sean Silva" <chisophugis at gmail.com>
> To: "Michael Zolotukhin" <mzolotukhin at apple.com>
> Cc: "LLVM Developers Mailing List (llvmdev at cs.uiuc.edu)" <llvmdev at cs.uiuc.edu>
> Sent: Saturday, January 24, 2015 7:02:35 AM
> Subject: Re: [LLVMdev] [RFC] Heuristic for complete loop unrolling
> 
> 
> On Fri, Jan 23, 2015 at 8:05 PM, Michael Zolotukhin <
> mzolotukhin at apple.com > wrote:
> 
> 
> Hi devs,
> 
> Recently I came across an interesting testcase that LLVM failed to
> optimize well. The test does some image processing, and as a part of
> it, it traverses all the pixels and computes some value basing on
> the adjacent pixels. So, the hot part looks like this:
> 
> for(y = 0..height) {
> for (x = 0..width) {
> val = 0
> for (j = 0..5) {
> for (i = 0..5) {
> val += img[x+i,y+j] * weight[i,j]
> }
> }
> }
> }
> 
> And ‘weight' is just a constant matrix with some coefficients.
> 
> If we unroll the two internal loops (with tripcount 5), then we can
> replace weight[i,j] with concrete constant values. In this
> particular case, many of the coefficients are actually 0 or 1, which
> enables huge code simplifications later on. But currently we unroll
> only the innermost one, because unrolling both of them will exceed
> the threshold.
> 
> When deciding whether to unroll or not, we currently look only at the
> instruction count of the loop. My proposal is to, on top of that,
> check if we can enable any later optimizations by unrolling - in
> this case by replacing a load with a constant. Similar to what we do
> in inlining heuristics, we can estimate how many instructions would
> be potentially eliminated after unrolling and adjust our threshold
> with this value.
> 
> I can imagine that it might be also useful for computations,
> involving sparse constant matrixes (like identity matrix).
> 
> The attached patch implements this feature, and with it we handle the
> original testcase well.
> 
> 
> 
> 
> Does it look good? Of course, any ideas, suggestions and other
> feedback are welcome!
> 
> 
> 
> This is just a naive convolution algorithm implementation. Either the
> author doesn't care about performance, or the author is expecting
> the compiler to recognize the convolution and perform high-level
> loop optimizations.
> 
> 
> I'm not a domain expert on loop optimization, but convolution is an
> operation of such *enormous* importance (FIR filtering, image
> processing, PDE solving, etc.) that I would expect this to be
> well-researched already. I wouldn't expect a C compiler to have this
> optimization, but I would expect a fortran compiler to have it.

I strongly feel that we should aim for a convergence in these two sets of expectations. Especially given the literature in this area, it seems reasonable to investigate adding this kind of capability.

 -Hal

> A
> quick search of the literature shows results well into the last
> millennium about optimizing this particular kind of loop (which
> fortran calls a "stencil").
> 
> 
> I actually think the general problem that this thread has been
> talking about (how to determine if a particular optimization is
> going to enable further optimizations) is very interesting in
> general, but I would take this particular example with a grain of
> salt because is a very specific stylized computation.
> 
> 
> -- Sean Silva
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> Michael
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 
> 
> 
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu         http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory