[llvm] r265337 - Enable unroll for constant bound loops when TripCount is not modulo of unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit.

Mon Apr 4 13:52:25 PDT 2016

Oh, absolutely; it seems reasonable for runtime unrolling (since usually with runtime unrolling you can’t avoid a fixup loop at all unless you actually know the trip count is divisible by some N, which seems fairly unlikely). I can see partial unrolling being useful in this way in some cases, but it’s not what we want (and not what it did before); do you need partial unrolling to work this way for your target?

—escha 

> On Apr 4, 2016, at 1:45 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
> 
>>  if (Count <= 1 && Unrolling == Runtime) {
> for sure I mean this somewhere else in code.
> Just allow this type of unrolling when unroll runtime is set.
> 
> 
> 
> 
> On Mon, Apr 4, 2016 at 1:42 PM, Fiona Glaser <fglaser at apple.com> wrote:
>> 
>> On Apr 4, 2016, at 1:41 PM, via llvm-commits <llvm-commits at lists.llvm.org>
>> wrote:
>> 
>> 
>> On Apr 4, 2016, at 1:35 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>> 
>> Before the patch the loop
>> for (i = 0; i < 15; i++)
>> loop_body;
>> was not unrolled,
>> 
>> the loop
>> for (i = 0; i < 16; i++)
>> loop_body;
>> was unrolled
>> 
>> the loop
>> for (i = 0; i < n; i++)
>> loop_body;
>> was unrolled
>> 
>> Why we should avoid unrolling if threshold let us unroll a loop?
>> The sense of unrolling (right now) is to reduce induction variable and
>> compare/branch costs.
>> 
>> One of possible solutions is to add " && Unrolling == Runtime":
>> 
>>     if (Count <= 1 && Unrolling == Runtime) {
>> 
>> 
>> 
>> What do you mean? That code is already under this branch:
>> 
>>  if (Unrolling == Partial) {
>> 
>> So it would never trigger, if I’m reading this right.
>> 
>> However I still do not understand why we should avoid unrolling if
>> threshold let us unroll a loop?
>> For the cases where unroll is unprofitable there should be
>> corresponding heuristics.
>> What is your case?
>> 
>> 
>> You’ve changed the definition of “partial” unrolling from what it did
>> before, which makes me someone nervous in general. Our specific use-case for
>> partial unrolling is that GPUs want to reduce latency, so a big loop with
>> high-latency memory operations in it (too big to fully unroll) should be
>> partially unrolled to trade some number of registers for some amount of
>> latency reduction. However, suppose the following case occurs:
>> 
>> Trip count: 15
>> Max unroll count: 8
>> 
>> This means we unroll 8 times, then create a fixup loop that runs 7 times
>> afterwards. Now we have the absolute worst of both worlds: our register
>> count has gone up a lot because of the unroll, but we still have a lot of
>> latency because of the fixup loop, so we’ll probably end up losing
>> performance overall.
>> 
>> —escha
>> 
>> 
>> Corrected example:
>> 
>> Trip count: 13
>> Max unroll count: 8
>> Fixup loop size: 5
>> 
>> (The 15 case wouldn’t happen because it’d do a modulo-unroll of size 5).
>> 
>> —escha
>>