[LLVMdev] [RFC] Heuristic for complete loop unrolling
Hal Finkel
hfinkel at anl.gov
Sat Jan 24 06:25:19 PST 2015
----- Original Message -----
> From: "Sean Silva" <chisophugis at gmail.com>
> To: "Michael Zolotukhin" <mzolotukhin at apple.com>
> Cc: "LLVM Developers Mailing List (llvmdev at cs.uiuc.edu)" <llvmdev at cs.uiuc.edu>
> Sent: Saturday, January 24, 2015 7:02:35 AM
> Subject: Re: [LLVMdev] [RFC] Heuristic for complete loop unrolling
>
>
> On Fri, Jan 23, 2015 at 8:05 PM, Michael Zolotukhin <
> mzolotukhin at apple.com > wrote:
>
>
> Hi devs,
>
> Recently I came across an interesting testcase that LLVM failed to
> optimize well. The test does some image processing, and as a part of
> it, it traverses all the pixels and computes some value basing on
> the adjacent pixels. So, the hot part looks like this:
>
> for(y = 0..height) {
> for (x = 0..width) {
> val = 0
> for (j = 0..5) {
> for (i = 0..5) {
> val += img[x+i,y+j] * weight[i,j]
> }
> }
> }
> }
>
> And ‘weight' is just a constant matrix with some coefficients.
>
> If we unroll the two internal loops (with tripcount 5), then we can
> replace weight[i,j] with concrete constant values. In this
> particular case, many of the coefficients are actually 0 or 1, which
> enables huge code simplifications later on. But currently we unroll
> only the innermost one, because unrolling both of them will exceed
> the threshold.
>
> When deciding whether to unroll or not, we currently look only at the
> instruction count of the loop. My proposal is to, on top of that,
> check if we can enable any later optimizations by unrolling - in
> this case by replacing a load with a constant. Similar to what we do
> in inlining heuristics, we can estimate how many instructions would
> be potentially eliminated after unrolling and adjust our threshold
> with this value.
>
> I can imagine that it might be also useful for computations,
> involving sparse constant matrixes (like identity matrix).
>
> The attached patch implements this feature, and with it we handle the
> original testcase well.
>
>
>
>
> Does it look good? Of course, any ideas, suggestions and other
> feedback are welcome!
>
>
>
> This is just a naive convolution algorithm implementation. Either the
> author doesn't care about performance, or the author is expecting
> the compiler to recognize the convolution and perform high-level
> loop optimizations.
>
>
> I'm not a domain expert on loop optimization, but convolution is an
> operation of such *enormous* importance (FIR filtering, image
> processing, PDE solving, etc.) that I would expect this to be
> well-researched already. I wouldn't expect a C compiler to have this
> optimization, but I would expect a fortran compiler to have it.
I strongly feel that we should aim for a convergence in these two sets of expectations. Especially given the literature in this area, it seems reasonable to investigate adding this kind of capability.
-Hal
> A
> quick search of the literature shows results well into the last
> millennium about optimizing this particular kind of loop (which
> fortran calls a "stencil").
>
>
> I actually think the general problem that this thread has been
> talking about (how to determine if a particular optimization is
> going to enable further optimizations) is very interesting in
> general, but I would take this particular example with a grain of
> salt because is a very specific stylized computation.
>
>
> -- Sean Silva
>
>
>
>
>
>
>
>
>
>
> Thanks,
> Michael
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> LLVMdev at cs.uiuc.edu http://llvm.cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-dev
mailing list