[llvm] r265337 - Enable unroll for constant bound loops when TripCount is not modulo of unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit.

via llvm-commits llvm-commits at lists.llvm.org
Mon Apr 4 13:43:07 PDT 2016


> On Apr 4, 2016, at 1:41 PM, via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
> 
>> On Apr 4, 2016, at 1:35 PM, Evgeny Stupachenko <evstupac at gmail.com <mailto:evstupac at gmail.com>> wrote:
>> 
>> Before the patch the loop
>> for (i = 0; i < 15; i++)
>>  loop_body;
>> was not unrolled,
>> 
>> the loop
>> for (i = 0; i < 16; i++)
>>  loop_body;
>> was unrolled
>> 
>> the loop
>> for (i = 0; i < n; i++)
>>  loop_body;
>> was unrolled
>> 
>> Why we should avoid unrolling if threshold let us unroll a loop?
>> The sense of unrolling (right now) is to reduce induction variable and
>> compare/branch costs.
>> 
>> One of possible solutions is to add " && Unrolling == Runtime":
>>>      if (Count <= 1 && Unrolling == Runtime) {
>> 
> 
> What do you mean? That code is already under this branch:
> 
>   if (Unrolling == Partial) {
> 
> So it would never trigger, if I’m reading this right.
> 
>> However I still do not understand why we should avoid unrolling if
>> threshold let us unroll a loop?
>> For the cases where unroll is unprofitable there should be
>> corresponding heuristics.
>> What is your case?
> 
> You’ve changed the definition of “partial” unrolling from what it did before, which makes me someone nervous in general. Our specific use-case for partial unrolling is that GPUs want to reduce latency, so a big loop with high-latency memory operations in it (too big to fully unroll) should be partially unrolled to trade some number of registers for some amount of latency reduction. However, suppose the following case occurs:
> 
> Trip count: 15
> Max unroll count: 8
> 
> This means we unroll 8 times, then create a fixup loop that runs 7 times afterwards. Now we have the absolute worst of both worlds: our register count has gone up a lot because of the unroll, but we still have a lot of latency because of the fixup loop, so we’ll probably end up losing performance overall.
> 
> —escha

Corrected example:

Trip count: 13
Max unroll count: 8
Fixup loop size: 5

(The 15 case wouldn’t happen because it’d do a modulo-unroll of size 5).

—escha

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160404/04070ea0/attachment.html>


More information about the llvm-commits mailing list