[PATCH] [Runtime Unrolling] use a loop to simplify the runtime unrolling prologue.
James Molloy
james at jamesmolloy.co.uk
Tue Sep 2 04:48:20 PDT 2014
Hi Kevin,
The "obvious" (to me at least) prologue would be to use something similar
to Duff's Device:
extraiters = tripcount % loopfactor
switch (extraiters) {
case 0: jump loop:
case 1: jump L1
case 2: jump L2
case 3: jump L3
}
Loop:
tripcount --;
LoopBody
L1:
tripcount --;
LoopBody
L2:
tripcount --;
LoopBody
L3:
tripcount --;
LoopBody
if (tripcount >= 0) jump Loop else jump Out
Out:
The switch would be changed into a lookup/jump table. Wouldn't this produce
better code too?
Cheers,
James
On 2 September 2014 10:56, Kevin Qin <kevinqindev at gmail.com> wrote:
> Runtime unrolling will create a prologue to execute the extra iterations
> which is can't divided by the unroll factor. It generates an if-then-else
> sequence to jump into a factor -1 times unrolled loop body, like
>
> extraiters = tripcount % loopfactor
> if (extraiters == 0) jump Loop:
> if (extraiters == loopfactor) jump L1
> if (extraiters == loopfactor-1) jump L2
> ...
> L1: LoopBody;
> L2: LoopBody;
> ...
> if tripcount < loopfactor jump End
> Loop:
> ...
> End:
>
> It means if the unroll factor is 4, the loop body will be 7 times
> unrolled, 3 are in loop prologue, and 4 are in the loop.
> This patch is to use a loop to execute the extra iterations in prologue,
> like
>
> extraiters = tripcount % loopfactor
> if (extraiters == 0) jump Loop:
> else jump Prol
> Prol: LoopBody;
> extraiters -= 1 // Omitted if unroll factor is 2.
> if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.
> if (tripcount < loopfactor) jump End
> Loop:
> ...
> End:
>
> Then when unroll factor is 4, the loop body will be copied by only 5
> times, 1 in the prologue loop, 4 in the original loop. And if the unroll
> factor is 2, new loop won't be created, just as the original solution.
>
> On AArch64 target, if runtime unrolling enabled, after applying this
> patch, the code size will drop by 10%.
>
> Also, the sequence of if-then-else sequence is saved, which could bring
> very slightly performance benefit, which is less than 0.1% on X86 and
> AArch64 target.
>
> So overall, this patch can bring a lot of code size improvement, and have
> no harm to performance.
>
> Is it OK to commit?
>
> Thanks,
> Kevin
>
> http://reviews.llvm.org/D5147
>
> Files:
> lib/Transforms/Utils/LoopUnrollRuntime.cpp
> test/Transforms/LoopUnroll/PowerPC/a2-unrolling.ll
> test/Transforms/LoopUnroll/runtime-loop.ll
> test/Transforms/LoopUnroll/runtime-loop1.ll
> test/Transforms/LoopUnroll/runtime-loop2.ll
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20140902/5f90e374/attachment.html>
More information about the llvm-commits
mailing list