[PATCH] hasPath

Fri Jul 26 18:23:52 PDT 2013

  Thanks for the review! I'm going to submit this shortly, but I'll be happy to make further changes in followup patches.

================
Comment at: include/llvm/Analysis/CFG.h:36
@@ +35,3 @@
+void FindFunctionBackedges(const Function &F,
+      SmallVectorImpl<std::pair<const BasicBlock*,const BasicBlock*> > &Result);
+
----------------
Chandler Carruth wrote:
> Want to reformat the code you're moving in this patch? Your call, I know this is a move.
Done, clang-format'd.

================
Comment at: include/llvm/Analysis/CFG.h:53
@@ +52,3 @@
+/// 'To' can not be executed. Conservatively returns true.
+bool isPotentiallyReachable(const Instruction *From, const Instruction *To,
+                            DominatorTree *DT = 0, LoopInfo *LI = 0);
----------------
Chandler Carruth wrote:
> Would it be useful to include a convenience overload that works in terms of BBs?
> 
> Also, why not references to instructions? Can these usefully be null?
It's uncommon to use references to IR in llvm. You're right that they can't be null, but I have an irrational dislike of this change. Sorry.

================
Comment at: include/llvm/Analysis/CFG.h:52
@@ +51,3 @@
+/// Returns false only if we can prove that once 'From' has been executed then
+/// 'To' can not be executed. Conservatively returns true.
+bool isPotentiallyReachable(const Instruction *From, const Instruction *To,
----------------
Chandler Carruth wrote:
> I think it would be useful to give some guidance about the performance (big-O complexity based) that users should expect to help guide callers.
> 
> I would also add a brief one-line comment.
Great idea, I've written a chunk about the cost. I'd rather have something more like the complexity guarantees on something like the C++ standard, but ended up with a description of the algorithm. It'll do.

I tried and failed to write a \brief within 80-columns.

================
Comment at: lib/Analysis/CFG.cpp:117-126
@@ +116,12 @@
+
+// LoopInfo contains a mapping from basic block to the innermost loop. Find
+// the outermost loop in the loop nest that contains BB.
+static const Loop *getOutermostLoop(LoopInfo *LI, const BasicBlock *BB) {
+  const Loop *L = LI->getLoopFor(BB);
+  if (L) {
+    while (const Loop *Parent = L->getParentLoop())
+      L = Parent;
+  }
+  return L;
+}
+
----------------
Chandler Carruth wrote:
> Not for this change, but I would like to see this added to LoopInfo's interface so others can find it rather than re-inventing it.
Agreed.

================
Comment at: lib/Analysis/CFG.cpp:128-134
@@ +127,9 @@
+
+// True if there is a loop which contains both BB1 and BB2.
+static bool loopContainsBoth(LoopInfo *LI,
+                             const BasicBlock *BB1, const BasicBlock *BB2) {
+  const Loop *L1 = getOutermostLoop(LI, BB1);
+  const Loop *L2 = getOutermostLoop(LI, BB2);
+  return L1 != NULL && L1 == L2;
+}
+
----------------
Chandler Carruth wrote:
> This also seems useful in LoopInfo. I would also potentially phrase it there in a more general form as 'loopContains(...)' which accepts 1, 2, or maybe more basic block (or instruction) arguments... But maybe this is the only case that comes up.
This code is unusual in that we always want to look at the outermost loop, while most users of Loop do not. We might want to move this to LoopInfo, but I would need to think hard about that.

================
Comment at: lib/Analysis/CFG.cpp:136-138
@@ +135,5 @@
+
+/// Determine whether there is a path from From to To within a single function.
+/// Returns false only if we can prove that once 'From' has been executed then
+/// 'To' can not be executed. Conservatively returns true.
+bool llvm::isPotentiallyReachable(const Instruction *A, const Instruction *B,
----------------
Chandler Carruth wrote:
> Don't duplicate the doxygen comment...
Done.

================
Comment at: lib/Analysis/CFG.cpp:151-159
@@ +150,11 @@
+    const BasicBlock *BB = A->getParent();
+    // Can't be in a loop if it's the entry block -- the entry block may not
+    // have predecessors.
+    bool HasLoop = BB != &BB->getParent()->getEntryBlock();
+
+    // Can't be in a loop if LoopInfo doesn't know about it.
+    if (LI && HasLoop) {
+      HasLoop = LI->getLoopFor(BB) != 0;
+    }
+    if (HasLoop) return true;
+
----------------
Chandler Carruth wrote:
> I think I would rewrite this whole thing as this to make it a bit more clear what's going on... Not sure how you feel about it though.
> 
>     // We can't handle loops. As a consequence, we can only reason about the entry block
>     // without loop info.
>     if ((LI && LI->getLoopFor(BB) != 0) ||
>         (!LI && BB != &BB->getParent()->getEntryBlock())
>       return true;
> 
> 
Hm, your comment made me realize that this code is suboptimal in the case where LI is NULL.

It's not exactly right that we can't handle loops. Rather, if we're inside a loop then it's possible to reach any instruction in the block from any other instruction in the block just by going around the backedge. Thus the answer for loops is a truthful and correct true, not merely a conservative "true". I've updated the comments.

However, in the case where LI is null, we shouldn't bail out and return true here. Instead we should go into the per-block scanning. It may be that this block has no successors for instance, and we can prove that we don't reach BB. I'll save that for a future patch, the fix is not as simple as my first attempt.

================
Comment at: lib/Analysis/CFG.cpp:161-163
@@ +160,5 @@
+
+    // Linear scan, start at 'A', see whether we hit 'B' or the end first.
+    for (BasicBlock::const_iterator BBI = A, BBE = BB->end(); BBI != BBE; ++BBI)
+      if (&*BBI == B) return true;
+    return false;
----------------
Chandler Carruth wrote:
> mem2reg keeps a cache of instruction indices within a block to make repeated queries for the same basic block fast -- do you envision this being useful for callers of this API?
I don't, I expect that at least one side (From or To) will be varying in parent BB with each call. I do expect to reuse the same To many times, for instance.

================
Comment at: lib/Analysis/CFG.cpp:172
@@ +171,3 @@
+
+  unsigned Limit = 32;
+  SmallSet<const BasicBlock*, 64> EverQueued;
----------------
Chandler Carruth wrote:
> Comment why the limit is 32?
> 
> Make this a constant?
> 
> Make this a command line flag for debugging?
> Comment why the limit is 32?

Done.

> Make this a constant?

CFG.cpp:205:12: error: decrement of read-only variable ‘Limit’

> Make this a command line flag for debugging?

Or make it an argument to the caller? I'll leave it for now as it's easy to change if we want to do either of these things.

================
Comment at: lib/Analysis/CFG.cpp:173-174
@@ +172,4 @@
+  unsigned Limit = 32;
+  SmallSet<const BasicBlock*, 64> EverQueued;
+  SmallVector<const BasicBlock*, 32> WorkList;
+  WorkList.push_back(A->getParent());
----------------
Chandler Carruth wrote:
> I much prefer 'Visited' to 'EverQueued'.
> 
> I also find it more conventional and easier to read 'Worklist' as a single word.
> I much prefer 'Visited' to 'EverQueued'.

Okay, I've switched the style to use a Visited+continue at the top.

> I also find it more conventional and easier to read 'Worklist' as a single word.

Yep. Done.

================
Comment at: lib/Analysis/CFG.cpp:179-182
@@ +178,6 @@
+
+  // When the stop block is unreachable, it's dominated from everywhere,
+  // regardless of whether there's a path between the two blocks.
+  if (DT && !DT->isReachableFromEntry(StopBB))
+    DT = NULL;
+
----------------
Chandler Carruth wrote:
> Do this before setting up the worklist. It makes more sense with the other special cases. Also, s/NULL/0/
Done.

================
Comment at: lib/Analysis/CFG.cpp:193
@@ +192,3 @@
+
+    if (!--Limit) return true;
+
----------------
Chandler Carruth wrote:
> Comment clearly that this is the conservative exit from the loop in the case where we don't know anything.
> 
> Also, I would put the return on its own line for consistency.
Done.

================
Comment at: lib/Analysis/CFG.cpp:200
@@ +199,3 @@
+      SmallVector<BasicBlock*, 32> Exits;
+      Outer->getExitBlocks(Exits);
+      for (unsigned i = 0, e = Exits.size(); i != e; ++i) {
----------------
Chandler Carruth wrote:
> Does this have append semantics? If not, give it append semantics?
> 
> Then you can append onto your worklist and do the equivalent of std::remove to skip the visited blocks.
> 
> If not, at least hoist this vector out of the loop and use clear to reset it. I suspect it can also benefit from a smaller base size than '32'.
It does have append semantics. Since switching to 'visited' form, I can remove Exits and the need for filtering the Worklist entirely (we just filter at the top of the loop).

================
Comment at: lib/Analysis/CFG.cpp:205-211
@@ +204,9 @@
+      }
+    } else {
+      for (succ_const_iterator I = succ_begin(BB), E = succ_end(BB); I != E;
+           ++I) {
+        if (EverQueued.insert(*I))
+          WorkList.push_back(*I);
+      }
+    }
+  } while (!WorkList.empty());
----------------
Chandler Carruth wrote:
> Could we do a similar trick with the domtree as the loopinfo?
> 
> Specifically, could we immediately sink to the leaves of the blocks domtree? That would allow us to skip over arbitrarily long chains of basic blocks rather than walking each one.
> 
> Clearly, adding this could go in a follow-up patch.
I don't think so. Given a partial domtree with nodes J K and L and edges J->K K->L, we can't sink straight from J to L. There's no guarantee that K doesn't have a successor we can ignore (it could have an edge to A which in turn dominates J).

That doesn't mean we couldn't make some use of domtree, but what you're asking for is a dominance frontier, and we would end up implementing the iterated dominance frontier algorithm in here, at which point it's more sensible to tell callers that they should pass in a loopinfo. This is exactly what loopinfo gives us.

================
Comment at: lib/Analysis/CFG.cpp:213
@@ +212,3 @@
+  } while (!WorkList.empty());
+  return false;
+}
----------------
Chandler Carruth wrote:
> Comment that the reason we  say 'false' here is that we have exhausted the possible paths, and thus know that it is in fact unreachable.
Done!

http://llvm-reviews.chandlerc.com/D996