[llvm-commits] [PATCH] Loop unrolling for run-time trip counts

Brendon Cahoon bcahoon at codeaurora.org
Sun Nov 6 15:22:39 PST 2011


Hi, 

 

This patch contains code to unroll loops that contain a run-time trip count.
It extends the existing code, in the LoopUnroll and LoopUnrollPass classes,
that unrolls loops with compile-time trip counts.

 

The ability to unroll loops with run-time trip counts using this patch is
turned off by default.  To enable the transformation, I added an option,
-unroll-runtime.  It's probably best to keep it disabled for now since some
programs may degrade in performance with the option enabled.  We're hoping
that some more tuning will help minimize any performance regressions.  Of
course, we do see performance improvements as well.

 

I tested llvm with the patch and it passes 'make check-all' and there are no
regressions in test-suite when using the 'simple' test target.  I have also
enabled the option and tested using test-suite, and there are no regressions
with the option enabled.    With the option enabled, there will be some
errors with the tests when running 'make check-all' for two reasons.  The
existing tests assume that there is no loop unrolling for run-time trip
counts, and my implementation requires that loop simplify is run afterward
(I am planning on fixing this issue).  I've also tested the code on a
different test suite for ARM and our soon to be added backend.

 

This implementation works by using the existing loop unrolling code to
actually unroll the loop by some unroll factor (the default is 8
iterations).   This patch generates code prior to the loop to compute the
number of extra iterations to execute before entering the unrolled loop.
The number of extra iterations is the run-time trip count modulo the unroll
factor.  We generate code to check the number of extra iterations and branch
to the extra copies of the loop body that  execute the extra iterations
before entering the unrolled loop.  We generate an if-then-else sequence,
which may get converted to a switch statement by LLVM.  The patch generates
'unroll factor - 1' copies of the loop body prior to the loop to execute
these extra iterations.  

 

This implementation only allow unroll factors that are a power to 2 to
reduce the cost of computing the number of extra iterations.  There are
other limitations in implementation including only allowing loops with a
single exit that occurs in the latch block.  There is certainly some room
for improving the code, but it works and improves performance on some actual
benchmarks.

 

The patch includes changes to the following files:

   lib/Transforms/Scalar/LoopUnrollPass.cpp

   lib/Transfoms/Utils/LoopUnroll.cpp

   include/llvm/Transforms/Utils/UnrollLoop.h

 

And new files in the test directory:
test/Transforms/LoopUnroll/runtime-loop[1-3].ll

 

I appreciate any comments, questions, etc. on the code.

 

Thanks,

-- Brendon Cahoon

--

Qualcomm Innovation Center, Inc is a member of Code Aurora Forum

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111106/676b55c4/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unroll-runtime.patch
Type: application/octet-stream
Size: 30517 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111106/676b55c4/attachment.obj>


More information about the llvm-commits mailing list