<html><head><meta http-equiv="Content-Type" content="text/html charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><br class=""><div><blockquote type="cite" class=""><div class="">On Jan 24, 2015, at 5:02 AM, Sean Silva <<a href="mailto:chisophugis@gmail.com" class="">chisophugis@gmail.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><br class="Apple-interchange-newline"><br style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;" class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;">On Fri, Jan 23, 2015 at 8:05 PM, Michael Zolotukhin<span class="Apple-converted-space"> </span><span dir="ltr" class=""><<a href="mailto:mzolotukhin@apple.com" target="_blank" class="">mzolotukhin@apple.com</a>></span><span class="Apple-converted-space"> </span>wrote:<br class=""><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;">Hi devs,<br class=""><br class="">Recently I came across an interesting testcase that LLVM failed to optimize well. The test does some image processing, and as a part of it, it traverses all the pixels and computes some value basing on the adjacent pixels. So, the hot part looks like this:<br class=""><br class="">for(y = 0..height) {<br class=""> for (x = 0..width) {<br class="">   val = 0<br class="">   for (j = 0..5) {<br class="">     for (i = 0..5) {<br class="">       val += img[x+i,y+j] * weight[i,j]<br class="">     }<br class="">   }<br class=""> }<br class="">}<br class=""><br class="">And ‘weight' is just a constant matrix with some coefficients.<br class=""><br class="">If we unroll the two internal loops (with tripcount 5), then we can replace weight[i,j] with concrete constant values. In this particular case, many of the coefficients are actually 0 or 1, which enables huge code simplifications later on. But currently we unroll only the innermost one, because unrolling both of them will exceed the threshold.<br class=""><br class="">When deciding whether to unroll or not, we currently look only at the instruction count of the loop. My proposal is to, on top of that, check if we can enable any later optimizations by unrolling - in this case by replacing a load with a constant. Similar to what we do in inlining heuristics, we can estimate how many instructions would be potentially eliminated after unrolling and adjust our threshold with this value.<br class=""><br class="">I can imagine that it might be also useful for computations, involving sparse constant matrixes (like identity matrix).<br class=""><br class="">The attached patch implements this feature, and with it we handle the original testcase well.<br class=""><br class=""><br class=""><br class=""><br class="">Does it look good? Of course, any ideas, suggestions and other feedback are welcome!<br class=""></blockquote><div class=""><br class=""></div><div class="">This is just a naive convolution algorithm implementation. Either the author doesn't care about performance, or the author is expecting the compiler to recognize the convolution and perform high-level loop optimizations.</div><div class=""><br class=""></div><div class="">I'm not a domain expert on loop optimization, but convolution is an operation of such *enormous* importance (FIR filtering, image processing, PDE solving, etc.) that I would expect this to be well-researched already. I wouldn't expect a C compiler to have this optimization, but I would expect a fortran compiler to have it. A quick search of the literature shows results well into the last millennium about optimizing this particular kind of loop (which fortran calls a "stencil").</div><div class=""><br class=""></div><div class="">I actually think the general problem that this thread has been talking about (how to determine if a particular optimization is going to enable further optimizations) is very interesting in general, but I would take this particular example with a grain of salt because is a very specific stylized computation.</div></div></div></blockquote>Hi Sean,</div><div><br class=""></div><div>I took this example as the most obvious one, but it’s not the only one. E.g. with the patch SPEC2006/458.sjeng also gains 2% on x86. It’s a program playing chess, and one of the subtasks of the algorithm is to find out whether a position is attacked or not. I haven’t investigated this gain thoroughly, but I saw a lot of constant arrays in the source code, so I believe this optimization just fired up there. And even the original example is not an artificial one - as James has already mentioned, it’s based on Geekbench/sharpen and Geekbench/blur (they use the same implementation of convolution algorithm). These two tests might get 90% and 5% in performance from such optimization. And then there are also vertex computations mentioned by Owen, which will also benefit from it.</div><div><br class=""></div><div>I also have a feeling that this is a targeted approach, but since it helps a lot of applications from different areas, I believe it’s worthwhile. I’d be happy to hear about other viable alternatives, but currently this one looks optimal to me, since it’s very cheap to implement (and thus to maintain), doesn’t add up much to compile time, and gives the expected results.</div><div><br class=""></div><div>Michael<br class=""><blockquote type="cite" class=""><div class=""><div class="gmail_quote" style="font-family: Helvetica; font-size: 12px; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: auto; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; -webkit-text-stroke-width: 0px;"><div class="">-- Sean Silva</div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""><br class=""></div><div class=""> </div><blockquote class="gmail_quote" style="margin: 0px 0px 0px 0.8ex; border-left-width: 1px; border-left-color: rgb(204, 204, 204); border-left-style: solid; padding-left: 1ex;"><br class=""><br class="">Thanks,<br class="">Michael<br class="">_______________________________________________<br class="">LLVM Developers mailing list<br class=""><a href="mailto:LLVMdev@cs.uiuc.edu" target="_blank" class="">LLVMdev@cs.uiuc.edu</a>         <a href="http://llvm.cs.uiuc.edu/" target="_blank" class="">http://llvm.cs.uiuc.edu</a><br class=""><a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev" target="_blank" class="">http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev</a></blockquote></div></div></blockquote></div><br class=""></body></html>