[llvm] r265337 - Enable unroll for constant bound loops when TripCount is not modulo of unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit.

Mon Apr 4 14:55:54 PDT 2016

> I don’t follow what you’re proposed change is?
Do not create fixup for "lane-predicated architecture".
I agree that for this type of architectures we should keep loop
iterations inside loop.

It looks like that compromise is to enable unroll for constant bound
loops with TripCount that is not modulo of unroll factor only when
"-unroll-runtime" is true.
If it is ok, I'll prepare corresponding patch today.

On Mon, Apr 4, 2016 at 2:31 PM, Owen Anderson <resistor at mac.com> wrote:
> I don’t follow what you’re proposed change is?
>
> —Owen
>
>> On Apr 4, 2016, at 2:28 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>
>> Sounds reasonable. Why not to include the check? By default unroll do
>> not generate fixup loop (even in further passes it appeared to be a
>> number of peeled iterations, not a loop).
>>
>> On Mon, Apr 4, 2016 at 2:19 PM, Owen Anderson <resistor at mac.com> wrote:
>>> More generally, for any lane-predicated architecture, the introduction of a
>>> fixup loop is generally a bad idea.
>>>
>>> —Owen
>>>
>>> On Apr 4, 2016, at 1:52 PM, via llvm-commits <llvm-commits at lists.llvm.org>
>>> wrote:
>>>
>>> Oh, absolutely; it seems reasonable for runtime unrolling (since usually
>>> with runtime unrolling you can’t avoid a fixup loop at all unless you
>>> actually know the trip count is divisible by some N, which seems fairly
>>> unlikely). I can see partial unrolling being useful in this way in some
>>> cases, but it’s not what we want (and not what it did before); do you need
>>> partial unrolling to work this way for your target?
>>>
>>> —escha
>>>
>>> On Apr 4, 2016, at 1:45 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>>
>>> if (Count <= 1 && Unrolling == Runtime) {
>>>
>>> for sure I mean this somewhere else in code.
>>> Just allow this type of unrolling when unroll runtime is set.
>>>
>>>
>>>
>>>
>>> On Mon, Apr 4, 2016 at 1:42 PM, Fiona Glaser <fglaser at apple.com> wrote:
>>>
>>>
>>> On Apr 4, 2016, at 1:41 PM, via llvm-commits <llvm-commits at lists.llvm.org>
>>> wrote:
>>>
>>>
>>> On Apr 4, 2016, at 1:35 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>>
>>> Before the patch the loop
>>> for (i = 0; i < 15; i++)
>>> loop_body;
>>> was not unrolled,
>>>
>>> the loop
>>> for (i = 0; i < 16; i++)
>>> loop_body;
>>> was unrolled
>>>
>>> the loop
>>> for (i = 0; i < n; i++)
>>> loop_body;
>>> was unrolled
>>>
>>> Why we should avoid unrolling if threshold let us unroll a loop?
>>> The sense of unrolling (right now) is to reduce induction variable and
>>> compare/branch costs.
>>>
>>> One of possible solutions is to add " && Unrolling == Runtime":
>>>
>>>   if (Count <= 1 && Unrolling == Runtime) {
>>>
>>>
>>>
>>> What do you mean? That code is already under this branch:
>>>
>>> if (Unrolling == Partial) {
>>>
>>> So it would never trigger, if I’m reading this right.
>>>
>>> However I still do not understand why we should avoid unrolling if
>>> threshold let us unroll a loop?
>>> For the cases where unroll is unprofitable there should be
>>> corresponding heuristics.
>>> What is your case?
>>>
>>>
>>> You’ve changed the definition of “partial” unrolling from what it did
>>> before, which makes me someone nervous in general. Our specific use-case for
>>> partial unrolling is that GPUs want to reduce latency, so a big loop with
>>> high-latency memory operations in it (too big to fully unroll) should be
>>> partially unrolled to trade some number of registers for some amount of
>>> latency reduction. However, suppose the following case occurs:
>>>
>>> Trip count: 15
>>> Max unroll count: 8
>>>
>>> This means we unroll 8 times, then create a fixup loop that runs 7 times
>>> afterwards. Now we have the absolute worst of both worlds: our register
>>> count has gone up a lot because of the unroll, but we still have a lot of
>>> latency because of the fixup loop, so we’ll probably end up losing
>>> performance overall.
>>>
>>> —escha
>>>
>>>
>>> Corrected example:
>>>
>>> Trip count: 13
>>> Max unroll count: 8
>>> Fixup loop size: 5
>>>
>>> (The 15 case wouldn’t happen because it’d do a modulo-unroll of size 5).
>>>
>>> —escha
>>>
>>>
>>> _______________________________________________
>>> llvm-commits mailing list
>>> llvm-commits at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>
>>>
>