[llvm-commits] [PATCH] Loop unrolling for run-time trip counts

Sun Dec 4 19:35:51 PST 2011

Here's the latest patch containing the code to implement loop unrolling for
run-time trip counts.  I apologize for taking so long with making the
suggested changes and submitting a new patch.  I think I've addressed all of
Andy's comments (much appreciated) below and made the appropriate changes.  

The main changes include creating a separate file, called
LoopUnrollRuntime.cpp, to separate the code for unrolling loops with
run-time or compile-time trip counts.  Also, I fixed the implementation so
that the loop simplify analysis is preserved.

The only suggestion I didn't include was using "LVMap.swap(VMap)" instead of
copying the map.  The ValuetoValueMapTy class doesn't contain a swap method.
Instead, I was able to change the code to reduce the amount of copying that
occurs.

The patch passes make check-all and the I've run the tests in
projects/test-suite with different unroll factors.  Also, the ability to
unroll loops with run-time trip counts is turned off by default.  It is
enabled with the -unroll-runtime flag.

I'd appreciate any further comments, etc. on the patch.  The turnaround time
for any additional changes should be much faster.

Thanks,

Brendon

--

Qualcomm Innovation Center, Inc is a member of Code Aurora Forum

From: Andrew Trick [mailto:atrick at apple.com] 
Sent: Tuesday, November 15, 2011 11:59 PM
To: Brendon Cahoon
Cc: llvm-commits at cs.uiuc.edu LLVM
Subject: Re: [llvm-commits] [PATCH] Loop unrolling for run-time trip counts

For the purpose of code review, I'll ignore the issue of

unroll-runtime profitability, which is obviously left for later. I

will say that this is only half the framework because to achieve

data-alignment we would need to emit an epilog as well. Also, I

misread your original message, and assumed you were adding a prolog

*loop*. That would be nice for targets that care about code

size. There are cases when unrolling the main loop can be done without

increasing the size of the loop body. Your design would not always be

well suited for those cases. On the other hand, your design is a much

simpler CFG transform. We have less SSA/LoopInfo updating to worry

about this way.

Never mind my earlier comment about DominatorTree. I forgot we always

recompute it. You're welcome to fix that :)

It does seem like your transform should preserve LoopSimplify. I'm

guessing you just need a unique loop exit block--just split the

exit. Can you please do that?

In general, your implementation looks quite nice.

I am concerned that you're doubling the size of LoopUnroll.cpp for a

feature that most people won't need to look at. I'd be ok with you

creating a new .cpp file. Or at least make it blatantly obvious that

all the new code is specific to runtime-unroll. For example, you can

add function-level comments like "Support for -runtime-unroll", and

code sections like:

//===----------------------------------------------------------------------=
==//

// Implementation of -runtime-unroll.

//===----------------------------------------------------------------------=
==//

<your code>

//===----------------------------------------------------------------------=
==//

// Driver for loop unrolling transformations.

//===----------------------------------------------------------------------=
==//

...

Formatting: You have two 80-col violations, but otherwise looks great.

Typo:

+ // ... loop

+ // loop prologue

---

In EmitRuntimePrologueFixup:

+///    extraiters = tripcount % loopfactor

+///    if (extraiters == 0) jump Loop:

+///    if (extraiters == loopfactor) jump L1

+///    if (extraiters == loopfactor-1) jump L2

You may want to comment that you expect a later pass to convert this

to a switch.

Before you erase PH->terminator, maybe you should assert that it has one

successor pointing to PEnd.

+ std::vector<BasicBlock*> NewBlocks(LoopBlocks.size());

No need to initialize the vector, then immediately clear it. You can use

NewBlocks.reserve(LoopBlocks.size()).

+  for (unsigned i = Count-1; i > 0; --i) {

This is a large, complex loop. To be more self-documenting, use a

meaningful variable like "leftOverIters". Then it will be clear why

you're counting down, and the nested loop won't hide the variable.

+    if (i != Count-1) {

My personal preference, I'd like to see the i == Count-1 case first

since that's what happens first.

---

In CloneLoopBlocks:

+    LPM->cloneBasicBlockSimpleAnalysis(LoopBlocks[j], NewBB, L);

I'm not sure this makes sense given that the new block is in the

parent loop. It may happen to work for the only user, LICM. Since

you're not doing anything to enforce safety in the API, I would add

comments in your code, in the API declaration, and in LICM that 'L' is

the "From" block's loop, not necessarily the "To" block's loop.

+       if (InsertTop != 0) {

Why isn't InsertTop set to PH for the first copy?

+    // Update our running map of newest clones

+    for (ValueToValueMapTy::iterator VI = VMap.begin(), VE = VMap.end();

If possible, avoid copying the map. For example, call LVMap.swap(VMap)

later, after you're done with the current loop iteration, before

RemapInstrucion. You might need more explicit handling of header phis

and latch terminators, but it's probably worth it both in efficiency

and code clarity. I realize the current unroller does this, but it

think the copying wasn't removed only because it does membership

checks on LastValueMap, which you don't need.

In runtime-loop3.ll please use FileCheck instead of grep. We're trying

to minimize dependence on external tools in the unit tests.

-Andy

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111204/c56b396b/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: unroll-runtime.patch
Type: application/octet-stream
Size: 32758 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20111204/c56b396b/attachment.obj>