[llvm] r265337 - Enable unroll for constant bound loops when TripCount is not modulo of unroll factor, reducing it to maximum power-of-2 that satisfies threshold limit.
Owen Anderson via llvm-commits
llvm-commits at lists.llvm.org
Mon Apr 4 16:17:43 PDT 2016
That sounds reasonable to me.
—Owen
> On Apr 4, 2016, at 2:55 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>
>> I don’t follow what you’re proposed change is?
> Do not create fixup for "lane-predicated architecture".
> I agree that for this type of architectures we should keep loop
> iterations inside loop.
>
> It looks like that compromise is to enable unroll for constant bound
> loops with TripCount that is not modulo of unroll factor only when
> "-unroll-runtime" is true.
> If it is ok, I'll prepare corresponding patch today.
>
> On Mon, Apr 4, 2016 at 2:31 PM, Owen Anderson <resistor at mac.com> wrote:
>> I don’t follow what you’re proposed change is?
>>
>> —Owen
>>
>>> On Apr 4, 2016, at 2:28 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>>
>>> Sounds reasonable. Why not to include the check? By default unroll do
>>> not generate fixup loop (even in further passes it appeared to be a
>>> number of peeled iterations, not a loop).
>>>
>>> On Mon, Apr 4, 2016 at 2:19 PM, Owen Anderson <resistor at mac.com> wrote:
>>>> More generally, for any lane-predicated architecture, the introduction of a
>>>> fixup loop is generally a bad idea.
>>>>
>>>> —Owen
>>>>
>>>> On Apr 4, 2016, at 1:52 PM, via llvm-commits <llvm-commits at lists.llvm.org>
>>>> wrote:
>>>>
>>>> Oh, absolutely; it seems reasonable for runtime unrolling (since usually
>>>> with runtime unrolling you can’t avoid a fixup loop at all unless you
>>>> actually know the trip count is divisible by some N, which seems fairly
>>>> unlikely). I can see partial unrolling being useful in this way in some
>>>> cases, but it’s not what we want (and not what it did before); do you need
>>>> partial unrolling to work this way for your target?
>>>>
>>>> —escha
>>>>
>>>> On Apr 4, 2016, at 1:45 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>>>
>>>> if (Count <= 1 && Unrolling == Runtime) {
>>>>
>>>> for sure I mean this somewhere else in code.
>>>> Just allow this type of unrolling when unroll runtime is set.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 4, 2016 at 1:42 PM, Fiona Glaser <fglaser at apple.com> wrote:
>>>>
>>>>
>>>> On Apr 4, 2016, at 1:41 PM, via llvm-commits <llvm-commits at lists.llvm.org>
>>>> wrote:
>>>>
>>>>
>>>> On Apr 4, 2016, at 1:35 PM, Evgeny Stupachenko <evstupac at gmail.com> wrote:
>>>>
>>>> Before the patch the loop
>>>> for (i = 0; i < 15; i++)
>>>> loop_body;
>>>> was not unrolled,
>>>>
>>>> the loop
>>>> for (i = 0; i < 16; i++)
>>>> loop_body;
>>>> was unrolled
>>>>
>>>> the loop
>>>> for (i = 0; i < n; i++)
>>>> loop_body;
>>>> was unrolled
>>>>
>>>> Why we should avoid unrolling if threshold let us unroll a loop?
>>>> The sense of unrolling (right now) is to reduce induction variable and
>>>> compare/branch costs.
>>>>
>>>> One of possible solutions is to add " && Unrolling == Runtime":
>>>>
>>>> if (Count <= 1 && Unrolling == Runtime) {
>>>>
>>>>
>>>>
>>>> What do you mean? That code is already under this branch:
>>>>
>>>> if (Unrolling == Partial) {
>>>>
>>>> So it would never trigger, if I’m reading this right.
>>>>
>>>> However I still do not understand why we should avoid unrolling if
>>>> threshold let us unroll a loop?
>>>> For the cases where unroll is unprofitable there should be
>>>> corresponding heuristics.
>>>> What is your case?
>>>>
>>>>
>>>> You’ve changed the definition of “partial” unrolling from what it did
>>>> before, which makes me someone nervous in general. Our specific use-case for
>>>> partial unrolling is that GPUs want to reduce latency, so a big loop with
>>>> high-latency memory operations in it (too big to fully unroll) should be
>>>> partially unrolled to trade some number of registers for some amount of
>>>> latency reduction. However, suppose the following case occurs:
>>>>
>>>> Trip count: 15
>>>> Max unroll count: 8
>>>>
>>>> This means we unroll 8 times, then create a fixup loop that runs 7 times
>>>> afterwards. Now we have the absolute worst of both worlds: our register
>>>> count has gone up a lot because of the unroll, but we still have a lot of
>>>> latency because of the fixup loop, so we’ll probably end up losing
>>>> performance overall.
>>>>
>>>> —escha
>>>>
>>>>
>>>> Corrected example:
>>>>
>>>> Trip count: 13
>>>> Max unroll count: 8
>>>> Fixup loop size: 5
>>>>
>>>> (The 15 case wouldn’t happen because it’d do a modulo-unroll of size 5).
>>>>
>>>> —escha
>>>>
>>>>
>>>> _______________________________________________
>>>> llvm-commits mailing list
>>>> llvm-commits at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>>>
>>>>
>>
More information about the llvm-commits
mailing list