[llvm] r265337 - Enable unroll for constant bound loops when TripCount is not modulo of unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit.

Mon Apr 4 14:19:36 PDT 2016

More generally, for any lane-predicated architecture, the introduction of a fixup loop is generally a bad idea.

—Owen

> On Apr 4, 2016, at 1:52 PM, via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
> Oh, absolutely; it seems reasonable for runtime unrolling (since usually with runtime unrolling you can’t avoid a fixup loop at all unless you actually know the trip count is divisible by some N, which seems fairly unlikely). I can see partial unrolling being useful in this way in some cases, but it’s not what we want (and not what it did before); do you need partial unrolling to work this way for your target?
> 
> —escha 
> 
>> On Apr 4, 2016, at 1:45 PM, Evgeny Stupachenko <evstupac at gmail.com <mailto:evstupac at gmail.com>> wrote:
>> 
>>> if (Count <= 1 && Unrolling == Runtime) {
>> for sure I mean this somewhere else in code.
>> Just allow this type of unrolling when unroll runtime is set.
>> 
>> 
>> 
>> 
>> On Mon, Apr 4, 2016 at 1:42 PM, Fiona Glaser <fglaser at apple.com <mailto:fglaser at apple.com>> wrote:
>>> 
>>> On Apr 4, 2016, at 1:41 PM, via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>>
>>> wrote:
>>> 
>>> 
>>> On Apr 4, 2016, at 1:35 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>> 
>>> Before the patch the loop
>>> for (i = 0; i < 15; i++)
>>> loop_body;
>>> was not unrolled,
>>> 
>>> the loop
>>> for (i = 0; i < 16; i++)
>>> loop_body;
>>> was unrolled
>>> 
>>> the loop
>>> for (i = 0; i < n; i++)
>>> loop_body;
>>> was unrolled
>>> 
>>> Why we should avoid unrolling if threshold let us unroll a loop?
>>> The sense of unrolling (right now) is to reduce induction variable and
>>> compare/branch costs.
>>> 
>>> One of possible solutions is to add " && Unrolling == Runtime":
>>> 
>>>    if (Count <= 1 && Unrolling == Runtime) {
>>> 
>>> 
>>> 
>>> What do you mean? That code is already under this branch:
>>> 
>>> if (Unrolling == Partial) {
>>> 
>>> So it would never trigger, if I’m reading this right.
>>> 
>>> However I still do not understand why we should avoid unrolling if
>>> threshold let us unroll a loop?
>>> For the cases where unroll is unprofitable there should be
>>> corresponding heuristics.
>>> What is your case?
>>> 
>>> 
>>> You’ve changed the definition of “partial” unrolling from what it did
>>> before, which makes me someone nervous in general. Our specific use-case for
>>> partial unrolling is that GPUs want to reduce latency, so a big loop with
>>> high-latency memory operations in it (too big to fully unroll) should be
>>> partially unrolled to trade some number of registers for some amount of
>>> latency reduction. However, suppose the following case occurs:
>>> 
>>> Trip count: 15
>>> Max unroll count: 8
>>> 
>>> This means we unroll 8 times, then create a fixup loop that runs 7 times
>>> afterwards. Now we have the absolute worst of both worlds: our register
>>> count has gone up a lot because of the unroll, but we still have a lot of
>>> latency because of the fixup loop, so we’ll probably end up losing
>>> performance overall.
>>> 
>>> —escha
>>> 
>>> 
>>> Corrected example:
>>> 
>>> Trip count: 13
>>> Max unroll count: 8
>>> Fixup loop size: 5
>>> 
>>> (The 15 case wouldn’t happen because it’d do a modulo-unroll of size 5).
>>> 
>>> —escha
>>> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160404/bffcc5d5/attachment.html>