<div dir="ltr">Hi Kevin,<div><br></div><div>The "obvious" (to me at least) prologue would be to use something similar to Duff's Device:</div><div><br></div><div><font face="courier new, monospace"><span style="font-size:13px">extraiters = tripcount % loopfactor</span><br>
</font></div><div><span style="font-size:13px"><font face="courier new, monospace">switch (extraiters) {</font></span></div><div><span style="font-size:13px"><font face="courier new, monospace">case 0: jump loop:</font></span></div>
<div><span style="font-size:13px"><font face="courier new, monospace">case 1: jump L1</font></span></div><div><span style="font-size:13px"><font face="courier new, monospace">case 2: jump L2</font></span></div><div><span style="font-size:13px"><font face="courier new, monospace">case 3: jump L3</font></span></div>
<div><span style="font-size:13px"><font face="courier new, monospace">}</font></span></div><div><span style="font-size:13px"><font face="courier new, monospace"><br></font></span></div><div><span style="font-size:13px"><font face="courier new, monospace">Loop:</font></span></div>
<div><span style="font-size:13px"><font face="courier new, monospace"> tripcount --;</font></span></div><div><font face="courier new, monospace"> LoopBody</font></div><div><font face="courier new, monospace">L1:</font></div>
<div><font face="courier new, monospace"> tripcount --;</font></div><div><font face="courier new, monospace"> LoopBody</font></div><div><font face="courier new, monospace">L2:</font></div><div><font face="courier new, monospace"> tripcount --;</font></div>
<div><font face="courier new, monospace"> LoopBody</font></div><div><font face="courier new, monospace">L3:</font></div><div><font face="courier new, monospace"> tripcount --;</font></div><div><font face="courier new, monospace"> LoopBody</font></div>
<div><font face="courier new, monospace"> </font></div><div><font face="courier new, monospace"> if (tripcount >= 0) jump Loop else jump Out</font></div><div><font face="courier new, monospace"><br></font></div><div>
<font face="courier new, monospace">Out:</font></div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">The switch would be changed into a lookup/jump table. Wouldn't this produce better code too?</font></div>
<div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">Cheers,</font></div><div><font face="arial, sans-serif"><br></font></div><div><font face="arial, sans-serif">James</font></div></div>
<div class="gmail_extra"><br><br><div class="gmail_quote">On 2 September 2014 10:56, Kevin Qin <span dir="ltr"><<a href="mailto:kevinqindev@gmail.com" target="_blank">kevinqindev@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Runtime unrolling will create a prologue to execute the extra iterations which is can't divided by the unroll factor. It generates an if-then-else sequence to jump into a factor -1 times unrolled loop body, like<br>
<br>
extraiters = tripcount % loopfactor<br>
if (extraiters == 0) jump Loop:<br>
if (extraiters == loopfactor) jump L1<br>
if (extraiters == loopfactor-1) jump L2<br>
...<br>
L1: LoopBody;<br>
L2: LoopBody;<br>
...<br>
if tripcount < loopfactor jump End<br>
Loop:<br>
...<br>
End:<br>
<br>
It means if the unroll factor is 4, the loop body will be 7 times unrolled, 3 are in loop prologue, and 4 are in the loop.<br>
This patch is to use a loop to execute the extra iterations in prologue, like<br>
<br>
extraiters = tripcount % loopfactor<br>
if (extraiters == 0) jump Loop:<br>
else jump Prol<br>
Prol: LoopBody;<br>
extraiters -= 1 // Omitted if unroll factor is 2.<br>
if (extraiters != 0) jump Prol: // Omitted if unroll factor is 2.<br>
if (tripcount < loopfactor) jump End<br>
Loop:<br>
...<br>
End:<br>
<br>
Then when unroll factor is 4, the loop body will be copied by only 5 times, 1 in the prologue loop, 4 in the original loop. And if the unroll factor is 2, new loop won't be created, just as the original solution.<br>
<br>
On AArch64 target, if runtime unrolling enabled, after applying this patch, the code size will drop by 10%.<br>
<br>
Also, the sequence of if-then-else sequence is saved, which could bring very slightly performance benefit, which is less than 0.1% on X86 and AArch64 target.<br>
<br>
So overall, this patch can bring a lot of code size improvement, and have no harm to performance.<br>
<br>
Is it OK to commit?<br>
<br>
Thanks,<br>
Kevin<br>
<br>
<a href="http://reviews.llvm.org/D5147" target="_blank">http://reviews.llvm.org/D5147</a><br>
<br>
Files:<br>
lib/Transforms/Utils/LoopUnrollRuntime.cpp<br>
test/Transforms/LoopUnroll/PowerPC/a2-unrolling.ll<br>
test/Transforms/LoopUnroll/runtime-loop.ll<br>
test/Transforms/LoopUnroll/runtime-loop1.ll<br>
test/Transforms/LoopUnroll/runtime-loop2.ll<br>
<br>_______________________________________________<br>
llvm-commits mailing list<br>
<a href="mailto:llvm-commits@cs.uiuc.edu">llvm-commits@cs.uiuc.edu</a><br>
<a href="http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits</a><br>
<br></blockquote></div><br></div>