[llvm-commits] [PATCH] Loop unrolling for run-time trip counts
Brendon Cahoon
bcahoon at codeaurora.org
Sun Dec 4 19:35:51 PST 2011
Here's the latest patch containing the code to implement loop unrolling for
run-time trip counts. I apologize for taking so long with making the
suggested changes and submitting a new patch. I think I've addressed all of
Andy's comments (much appreciated) below and made the appropriate changes.
The main changes include creating a separate file, called
LoopUnrollRuntime.cpp, to separate the code for unrolling loops with
run-time or compile-time trip counts. Also, I fixed the implementation so
that the loop simplify analysis is preserved.
The only suggestion I didn't include was using "LVMap.swap(VMap)" instead of
copying the map. The ValuetoValueMapTy class doesn't contain a swap method.
Instead, I was able to change the code to reduce the amount of copying that
occurs.
The patch passes make check-all and the I've run the tests in
projects/test-suite with different unroll factors. Also, the ability to
unroll loops with run-time trip counts is turned off by default. It is
enabled with the -unroll-runtime flag.
I'd appreciate any further comments, etc. on the patch. The turnaround time
for any additional changes should be much faster.
Thanks,
Brendon
--
Qualcomm Innovation Center, Inc is a member of Code Aurora Forum
From: Andrew Trick [mailto:atrick at apple.com]
Sent: Tuesday, November 15, 2011 11:59 PM
To: Brendon Cahoon
Cc: llvm-commits at cs.uiuc.edu LLVM
Subject: Re: [llvm-commits] [PATCH] Loop unrolling for run-time trip counts
For the purpose of code review, I'll ignore the issue of
unroll-runtime profitability, which is obviously left for later. I
will say that this is only half the framework because to achieve
data-alignment we would need to emit an epilog as well. Also, I
misread your original message, and assumed you were adding a prolog
*loop*. That would be nice for targets that care about code
size. There are cases when unrolling the main loop can be done without
increasing the size of the loop body. Your design would not always be
well suited for those cases. On the other hand, your design is a much
simpler CFG transform. We have less SSA/LoopInfo updating to worry
about this way.
Never mind my earlier comment about DominatorTree. I forgot we always
recompute it. You're welcome to fix that :)
It does seem like your transform should preserve LoopSimplify. I'm
guessing you just need a unique loop exit block--just split the
exit. Can you please do that?
In general, your implementation looks quite nice.
I am concerned that you're doubling the size of LoopUnroll.cpp for a
feature that most people won't need to look at. I'd be ok with you
creating a new .cpp file. Or at least make it blatantly obvious that
all the new code is specific to runtime-unroll. For example, you can
add function-level comments like "Support for -runtime-unroll", and
code sections like:
//===----------------------------------------------------------------------=
==//
// Implementation of -runtime-unroll.
//===----------------------------------------------------------------------=
==//
<your code>
//===----------------------------------------------------------------------=
==//
// Driver for loop unrolling transformations.
//===----------------------------------------------------------------------=
==//
...
Formatting: You have two 80-col violations, but otherwise looks great.
Typo:
+ // ... loop
+ // loop prologue
---
In EmitRuntimePrologueFixup:
+/// extraiters = tripcount % loopfactor
+/// if (extraiters == 0) jump Loop:
+/// if (extraiters == loopfactor) jump L1
+/// if (extraiters == loopfactor-1) jump L2
You may want to comment that you expect a later pass to convert this
to a switch.
Before you erase PH->terminator, maybe you should assert that it has one
successor pointing to PEnd.
+ std::vector<BasicBlock*> NewBlocks(LoopBlocks.size());
No need to initialize the vector, then immediately clear it. You can use
NewBlocks.reserve(LoopBlocks.size()).
+ for (unsigned i = Count-1; i > 0; --i) {
This is a large, complex loop. To be more self-documenting, use a
meaningful variable like "leftOverIters". Then it will be clear why
you're counting down, and the nested loop won't hide the variable.
+ if (i != Count-1) {
My personal preference, I'd like to see the i == Count-1 case first
since that's what happens first.
---
In CloneLoopBlocks:
+ LPM->cloneBasicBlockSimpleAnalysis(LoopBlocks[j], NewBB, L);
I'm not sure this makes sense given that the new block is in the
parent loop. It may happen to work for the only user, LICM. Since
you're not doing anything to enforce safety in the API, I would add
comments in your code, in the API declaration, and in LICM that 'L' is
the "From" block's loop, not necessarily the "To" block's loop.
+ if (InsertTop != 0) {
Why isn't InsertTop set to PH for the first copy?
+ // Update our running map of newest clones
+ for (ValueToValueMapTy::iterator VI = VMap.begin(), VE = VMap.end();
If possible, avoid copying the map. For example, call LVMap.swap(VMap)
later, after you're done with the current loop iteration, before
RemapInstrucion. You might need more explicit handling of header phis
and latch terminators, but it's probably worth it both in efficiency
and code clarity. I realize the current unroller does this, but it
think the copying wasn't removed only because it does membership
checks on LastValueMap, which you don't need.
In runtime-loop3.ll please use FileCheck instead of grep. We're trying
to minimize dependence on external tools in the unit tests.
-Andy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111204/c56b396b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unroll-runtime.patch
Type: application/octet-stream
Size: 32758 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111204/c56b396b/attachment.obj>
More information about the llvm-commits
mailing list