[LLVMdev] [RFC] Heuristic for complete loop unrolling

Michael Zolotukhin mzolotukhin at apple.com
Fri Jan 23 14:26:00 PST 2015


> On Jan 23, 2015, at 1:38 PM, Owen Anderson <resistor at mac.com> wrote:
> 
> Hi Michael,
> 
> I’m unsure about other targets, but for the ones I work on converting register-indexed loads into constant-indexed ones is a huge win, even in situations where the load cannot be constant folded.  Do you think there is any reasonable way to incorporate that into this change?  Maybe a TTI hook?
Hi Owen,

It’s not difficult to incorporate such customization into the patch. Actually, right now there is a magic constant in the patch which gives a weight to every potentially removed load instruction. It might make sense to make this constant target-dependent. And if we do it, we can also add a weight for loads that will become constant-indexed (not necessarily constant-folded).

Michael
> 
> —Owen
> 
> 
>> On Jan 23, 2015, at 12:05 PM, Michael Zolotukhin <mzolotukhin at apple.com <mailto:mzolotukhin at apple.com>> wrote:
>> 
>> Hi devs,
>> 
>> Recently I came across an interesting testcase that LLVM failed to optimize well. The test does some image processing, and as a part of it, it traverses all the pixels and computes some value basing on the adjacent pixels. So, the hot part looks like this:
>> 
>> for(y = 0..height) {
>> for (x = 0..width) {
>>  val = 0
>>  for (j = 0..5) {
>>    for (i = 0..5) {
>>      val += img[x+i,y+j] * weight[i,j]
>>    }
>>  }
>> }
>> }
>> 
>> And ‘weight' is just a constant matrix with some coefficients.
>> 
>> If we unroll the two internal loops (with tripcount 5), then we can replace weight[i,j] with concrete constant values. In this particular case, many of the coefficients are actually 0 or 1, which enables huge code simplifications later on. But currently we unroll only the innermost one, because unrolling both of them will exceed the threshold.
>> 
>> When deciding whether to unroll or not, we currently look only at the instruction count of the loop. My proposal is to, on top of that, check if we can enable any later optimizations by unrolling - in this case by replacing a load with a constant. Similar to what we do in inlining heuristics, we can estimate how many instructions would be potentially eliminated after unrolling and adjust our threshold with this value.
>> 
>> I can imagine that it might be also useful for computations, involving sparse constant matrixes (like identity matrix).
>> 
>> The attached patch implements this feature, and with it we handle the original testcase well.
>> 
>> <complete-unroll.patch>
>> 
>> Does it look good? Of course, any ideas, suggestions and other feedback are welcome!
>> 
>> 
>> Thanks,
>> Michael_______________________________________________
>> LLVM Developers mailing list
>> LLVMdev at cs.uiuc.edu <mailto:LLVMdev at cs.uiuc.edu>         http://llvm.cs.uiuc.edu <http://llvm.cs.uiuc.edu/>
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev <http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150123/9f108bdf/attachment.html>


More information about the llvm-dev mailing list