[llvm] r228265 - Implement new heuristic for complete loop unrolling.

Mon Feb 23 12:03:01 PST 2015

Hi Michael,

First, I think this would be much easier on Phabricator. Could you please upload the patch there (with full context):

A few comments:

 struct FindConstantPointers {

-  bool LoadCanBeConstantFolded;

   bool IndexIsConstant;

-  APInt Step;

-  APInt StartValue;

+  bool haveSeenAR;

   Value *BaseAddress;

+  const SCEVAddRecExpr *AddRec;

   const Loop *L;

   ScalarEvolution &SE;

   FindConstantPointers(const Loop *loop, ScalarEvolution &SE)

Please add comments to these variables explaining what they are, and generally, what the algorithm is.

+      // If AddRec is not-null, then we've already visited another AddRec. We

+      // don't handle multiple AddRecs here, so give up in this case.

+      if (AddRec)

         return IndexIsConstant = false;

Maybe I'm reading this incorrectly, but you're not supporting nested AddRecs at all? Correct me if I'm wrong, this means that:

int arr[] = { 1, 2, 3, 4, 5, 6, 7, 8 };
for (int j = 0; i < 8; ++j)
for (int i = 0; i < 8; ++i)
  x += arr[i];

this will work, but:

int arr[8][] = { { 1, 2, 3, 4, 5, 6, 7, 8 }, { 3, 4, 5, 6, 7, 8, 9, 10 } ,  ... };
for (int j = 0; i < 8; ++j)
for (int i = 0; i < 8; ++i)
  x += arr[i][j];

will not work. I have code that looks like this latter case (comes up when solving multi-dimensional PDEs). I also don't understand the point of the restriction (and you also don't seem to use the 'AddRec' variable anywhere, except for this check). How complicated is it to remove this restriction?

+    for (BasicBlock::iterator I = BB->begin(), E = BB->end(); I != E; ++I) {

Use range-based for loop.

+    for (unsigned TIdx = 0, TSize = TI->getNumSuccessors(); TIdx != TSize;

+         ++TIdx) {

And here too.

+      if (EarlyExitFound)

+        break;

This means that we've found an early exit that will definitely be taken, right? Please add a comment about that.

+  // Visit all GEPs in the loop L, and find those that after complete loop

+  // unrolling would become constants, or BaseAddress+Constant.  For this to

+  // happen, a GEPs SCEV expression should contain at most one AddRec

+  // expression, and the loop corresponding to this expression should be L. The

+  // rest SCEV sub-expressions should be either constants, or ScevUnknown

+  // (which would become the base address).

+  // If the expression contains the base address, then after subtracting it, we

+  // should get AddRec with constant step and start.

+  void cacheSCEVResults() {

This function seems poorly named for what it does. Why not just call it findConstantPointers?

+    // Don't simulate loops with a big or unknown tripcount

+    if (!UnrollMaxIterationsCountToAnalyze || !TripCount ||

+        TripCount > UnrollMaxIterationsCountToAnalyze)

+      return;

In the previous implementation, for loops with larger trip counts, we just estimated the result from the first N iterations. Should we still do that? Or is this threshold so high that, unless most of the iterations vanished completely, doing so would be irrelevant?

Thanks again,
Hal

----- Original Message -----
> From: "Michael Zolotukhin" <mzolotukhin at apple.com>
> To: "Chandler Carruth" <chandlerc at google.com>
> Cc: "Commit Messages and Patches for LLVM" <llvm-commits at cs.uiuc.edu>
> Sent: Friday, February 20, 2015 9:17:01 PM
> Subject: Re: [llvm] r228265 - Implement new heuristic for complete loop	unrolling.
> 
> 
> Hi Chandler,
> 
> 
> I prepared a new version of the patch, do you mind taking a look at
> it?
> 
> 
> To me it looks even simpler than the original implementation, and
> additionally can handle branches (I added two tests for that).
> 
> 
> 
> 
> 
> 
> 
> Thanks,
> Michael
> 
> 
> 
> 
> 
> On Feb 13, 2015, at 11:58 AM, Chandler Carruth < chandlerc at google.com
> > wrote:
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Fri, Feb 13, 2015 at 11:52 AM, Michael Zolotukhin <
> mzolotukhin at apple.com > wrote:
> 
> 
> 
> Thank you for the work and pointing out to the problems! It’s good
> that we found them early.
> 
> 
> I think that we would solve these issues if we do the following:
> 1. When looking for simplified instructions, visit each instruction
> only once. That’ll be achieved by visiting blocks in pre-order.
> 2. When looking for dead-instructions, also visit them only once -
> here we would need to visit blocks in reverse-order.
> 3. SCEV indeed might be too slow for this, but we actually only
> interested in expressions that become constants. That means, we need
> SCEV for expressions that either feed a load, or have constant Start
> and Step SCEV-expressions. We can scan all insns once and cache
> results for such potentially-constant instructions, other
> instructions don’t need SCEV-expression. So, we won’t need to talk
> to SCEV after this point.
> 4. As for using threshold for early-exit - that’s a really good idea,
> and it should integrate here naturally.
> 
> 
> What do you think about this? Am I missing something, or does it
> sound right?
> 
> 
> Generally yes, but I would start without trying to handle dead
> instructions, and following the sketch of how to build it I gave. I
> would then layer dead instructions on as a follow-up. There are
> several different approaches to handling dead instructions, and I
> think it will be useful to see how things look without handling them
> first.
> 
> 
> 
> Note that the outline I gave naturally will skip dead *control flow*,
> because we can simplify the condition to a constant and only
> traverse one of the successors.
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory