[llvm] r265337 - Enable unroll for constant bound loops when TripCount is not modulo of unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit.

Mon Apr 4 14:28:34 PDT 2016

Sounds reasonable. Why not to include the check? By default unroll do
not generate fixup loop (even in further passes it appeared to be a
number of peeled iterations, not a loop).

On Mon, Apr 4, 2016 at 2:19 PM, Owen Anderson <resistor at mac.com> wrote:
> More generally, for any lane-predicated architecture, the introduction of a
> fixup loop is generally a bad idea.
>
> —Owen
>
> On Apr 4, 2016, at 1:52 PM, via llvm-commits <llvm-commits at lists.llvm.org>
> wrote:
>
> Oh, absolutely; it seems reasonable for runtime unrolling (since usually
> with runtime unrolling you can’t avoid a fixup loop at all unless you
> actually know the trip count is divisible by some N, which seems fairly
> unlikely). I can see partial unrolling being useful in this way in some
> cases, but it’s not what we want (and not what it did before); do you need
> partial unrolling to work this way for your target?
>
> —escha
>
> On Apr 4, 2016, at 1:45 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>
> if (Count <= 1 && Unrolling == Runtime) {
>
> for sure I mean this somewhere else in code.
> Just allow this type of unrolling when unroll runtime is set.
>
>
>
>
> On Mon, Apr 4, 2016 at 1:42 PM, Fiona Glaser <fglaser at apple.com> wrote:
>
>
> On Apr 4, 2016, at 1:41 PM, via llvm-commits <llvm-commits at lists.llvm.org>
> wrote:
>
>
> On Apr 4, 2016, at 1:35 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>
> Before the patch the loop
> for (i = 0; i < 15; i++)
> loop_body;
> was not unrolled,
>
> the loop
> for (i = 0; i < 16; i++)
> loop_body;
> was unrolled
>
> the loop
> for (i = 0; i < n; i++)
> loop_body;
> was unrolled
>
> Why we should avoid unrolling if threshold let us unroll a loop?
> The sense of unrolling (right now) is to reduce induction variable and
> compare/branch costs.
>
> One of possible solutions is to add " && Unrolling == Runtime":
>
>    if (Count <= 1 && Unrolling == Runtime) {
>
>
>
> What do you mean? That code is already under this branch:
>
> if (Unrolling == Partial) {
>
> So it would never trigger, if I’m reading this right.
>
> However I still do not understand why we should avoid unrolling if
> threshold let us unroll a loop?
> For the cases where unroll is unprofitable there should be
> corresponding heuristics.
> What is your case?
>
>
> You’ve changed the definition of “partial” unrolling from what it did
> before, which makes me someone nervous in general. Our specific use-case for
> partial unrolling is that GPUs want to reduce latency, so a big loop with
> high-latency memory operations in it (too big to fully unroll) should be
> partially unrolled to trade some number of registers for some amount of
> latency reduction. However, suppose the following case occurs:
>
> Trip count: 15
> Max unroll count: 8
>
> This means we unroll 8 times, then create a fixup loop that runs 7 times
> afterwards. Now we have the absolute worst of both worlds: our register
> count has gone up a lot because of the unroll, but we still have a lot of
> latency because of the fixup loop, so we’ll probably end up losing
> performance overall.
>
> —escha
>
>
> Corrected example:
>
> Trip count: 13
> Max unroll count: 8
> Fixup loop size: 5
>
> (The 15 case wouldn’t happen because it’d do a modulo-unroll of size 5).
>
> —escha
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>