# [PATCH] [Runtime Unrolling] use a loop to simplify the runtime unrolling prologue.

James Molloy james at jamesmolloy.co.uk
Tue Sep 2 04:48:20 PDT 2014

```Hi Kevin,

The "obvious" (to me at least) prologue would be to use something similar
to Duff's Device:

extraiters = tripcount % loopfactor
switch (extraiters) {
case 0: jump loop:
case 1: jump L1
case 2: jump L2
case 3: jump L3
}

Loop:
tripcount --;
LoopBody
L1:
tripcount --;
LoopBody
L2:
tripcount --;
LoopBody
L3:
tripcount --;
LoopBody

if (tripcount >= 0) jump Loop else jump Out

Out:

The switch would be changed into a lookup/jump table. Wouldn't this produce
better code too?

Cheers,

James

On 2 September 2014 10:56, Kevin Qin <kevinqindev at gmail.com> wrote:

> Runtime unrolling will create a prologue to execute the extra iterations
> which is can't divided by the unroll factor. It generates an if-then-else
> sequence to jump into a factor -1 times unrolled loop body, like
>
>     extraiters = tripcount % loopfactor
>     if (extraiters == 0) jump Loop:
>     if (extraiters == loopfactor) jump L1
>     if (extraiters == loopfactor-1) jump L2
>     ...
>     L1:  LoopBody;
>     L2:  LoopBody;
>     ...
>     if tripcount < loopfactor jump End
>     Loop:
>     ...
>     End:
>
> It means if the unroll factor is 4, the loop body will be 7 times
> unrolled, 3 are in loop prologue, and 4 are in the loop.
> This patch is to use a loop to execute the extra iterations in prologue,
> like
>
>         extraiters = tripcount % loopfactor
>         if (extraiters == 0) jump Loop:
>         else jump Prol
>  Prol:  LoopBody;
>         extraiters -= 1                 // Omitted if unroll factor is 2.
>         if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.
>         if (tripcount < loopfactor) jump End
>  Loop:
>  ...
>  End:
>
> Then when unroll factor is 4, the loop body will be copied by only 5
> times, 1 in the prologue loop, 4 in the original loop. And if the unroll
> factor is 2, new loop won't be created, just as the original solution.
>
> On AArch64 target, if  runtime unrolling enabled, after applying this
> patch, the code size will drop by 10%.
>
> Also, the sequence of if-then-else sequence is saved, which could bring
> very slightly performance benefit, which is less than 0.1% on X86 and
> AArch64 target.
>
> So overall, this patch can bring a lot of code size improvement, and have
> no harm to performance.
>
> Is it OK to commit?
>
> Thanks,
> Kevin
>
> http://reviews.llvm.org/D5147
>
> Files:
>   lib/Transforms/Utils/LoopUnrollRuntime.cpp
>   test/Transforms/LoopUnroll/PowerPC/a2-unrolling.ll
>   test/Transforms/LoopUnroll/runtime-loop.ll
>   test/Transforms/LoopUnroll/runtime-loop1.ll
>   test/Transforms/LoopUnroll/runtime-loop2.ll
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140902/5f90e374/attachment.html>
```