[PATCH] D65164: Define some basic terminology around loops in our documentation

Wed Jul 31 10:10:24 PDT 2019

reames marked 5 inline comments as done.
reames added a comment.

In D65164#1605294 <https://reviews.llvm.org/D65164#1605294>, @Meinersbur wrote:

> I am unhappy about this having been committed already. Now we have official documentation at https://llvm.org/docs/LoopTerminology.html which is partially wrong.

In retrospect, I probably landed this a bit quickly.  I interpreted the comments as incremental refinement, and missed the one about cycles vs SCCs being correctness related.  I believe the sole correctness issue has been resolved.  If you see any others, please point them out.

================
Comment at: docs/LoopTerminology.rst:23
+  dominance requirement and such are not considered loops.  LoopInfo
+  does not include such cycles.
+
----------------
Meinersbur wrote:
> reames wrote:
> > Meinersbur wrote:
> > > jdoerfert wrote:
> > > > Could we add something like:
> > > > > commonly referred to as irreducible control flow, or irreducible loops.
> > > Also, such cycles are called loops in any literature that I know. It's just that LoopInfo does not create a loop object for them since they do not have a dominating header.
> > Can anyone provide me a citation for the definition of irreducible control flow?  I want to link to it when adding this bit.
> > 
> > As for the "everyone else calls them loops" points, I'm happy to mention that.  Have a suggestion for a survey paper or wikipedia which defines it?
> > 
> > p.s. Patch welcome to do this.  
> Here are the relevant excerpts from the dragon book, second edition:
> 
> {F9691106}
> {F9691110}
> {F9691109}
> 
> It does not use "loop" to avoid giving a definition, but defines "natural loops" only. As given by the definition and as seen in the example, a natural loop is defined per backedge. However, LLVM follows the book's suggestion and treats them as a single loop.
> 
> In another book I looked up (Michael Wolfe: High Performance Compilers For Parallel Computing) uses a graph's strongly connected components to search for cycles, side-stepping the problem of irreducible loops. However, I wouldn't use it since it ignores loop nesting.
Thanks for the citation here.  Reading through the excerpts, I want to draw your attention specifically to the second paragraph of the third image.  Per that wording, the definitions used in the book allow multiple natural loops to share a single header block (if one is a subloop of the other).  As per previous discussion in this thread and in https://reviews.llvm.org/D65299, our loop definition does not.  

Do you spot the inconsistency in the definitions?  I think it comes down to ours requiring a maximal SCC w/a backedge, and the "natural loop" term in the book not.  Do you agree?  If so, I'll try to draft something which compares and contrasts the definitions.  

================
Comment at: docs/LoopTerminology.rst:25-26
+
+* Loops can contain non-loop cycles and non-loop cycles may contain
+  loops.
+
----------------
Meinersbur wrote:
> reames wrote:
> > Meinersbur wrote:
> > > The concept of a backedge should have been introduced for this. A cycle inside a loop can either be
> > > 
> > >  * A backedge
> > >  * A backedge of a nested loop
> > >  * An irreducible loop
> > All of our definitions are inherently circular.  I tried to adjust this a bit, what do you think of the wording I landed with?
> The definition of a backedge is independent of the definition of a loop. Here is the definition from the dragon book:
> 
> > A back edge is an edge a → b whose head b dominates its tail a.
> 
> 
While in principle I agree, I'm not sure defining it that way actually clarifies the document.  In particular, we're almost always talking about the backedge *of a particular loop*, and thus the stand alone definition is somewhat confusing.  Feel free to suggest a patch w/a wording change if you want to wordsmith this.  

================
Comment at: docs/LoopTerminology.rst:35
+  outside the loop.  A loop is allowed to be statically infinite, so
+  there need not be any exiting edges.
+
----------------
Meinersbur wrote:
> reames wrote:
> > jdoerfert wrote:
> > > > * The loop header identifies a loop.
> > > > * Two loops are either disjoint or one is properly contained in the other.
> > > > * LoopInfo organizes loops in a tree structure with an artificial top-level loop in each function that contains all loops not contained in other loops.
> > The first point is is incorrect.  A single loop header can be the header of multiple nested loops.
> > 
> > I incorporated the second.
> > 
> > We should probably add a LoopInfo section; this didn't really feel like it belonged in the definition of the loops.  
> > The first point is is incorrect. A single loop header can be the header of multiple nested loops.
> 
> I think this has already been resolved.
Should be now fixed, let me know if the version after https://reviews.llvm.org/D65299 still needs polishing.

================
Comment at: docs/LoopTerminology.rst:83-84
+
+Iteration Count - The number of times the header has executed before
+some interesting event happens.  Commonly used w/o qualification to
+refer to the iteration count at which the loop exits.  Will always be
----------------
Meinersbur wrote:
> reames wrote:
> > kbarton wrote:
> > > Meinersbur wrote:
> > > > Non-rotated loops often have headers that only check the loop condition. The header executed, but if the condition evaluated to false, arguably nothing interesting happened, in particular the source language's loop body did not execute.
> > > Same comment as above: "The number of times the header will execute before some interesting event happens."
> > @Meinersbur - neither point is relevant for the llvm definition of a loop, but maybe there deserves to be a "mapping C to LLVM" section?
> "some interesting event" is just not a good way to make things clearer. Please avoid.
Concrete suggestions on how to word this?  I'm happy to improve, but it's the least awkward wording I could find.  

We need to be able to say things such as:
1) On iteration N, this branch switches from always taken to always untaken.
2) On iteration N, we exit through exit block X.
3) On iteration N, the loop will exit.
4) After a maximum of N iterations, the loop will exit.
5) Function F may throw, but only after iteration N.

(All are real examples from previous discussions.)

================
Comment at: llvm/trunk/docs/LoopTerminology.rst:15-16
+
+First, let's start with the basics.  In LLVM, a Loop is a cycle within
+the control flow graph (CFG) where there exists one block (the loop
+header block) which dominates all other blocks within the cycle.
----------------
Meinersbur wrote:
> Mathematically, a cycle is a path in a graph where the start and end nodes are the same. That is, any loop has infinitely many cycles by e.g. repeating the same path. I'd avoid defining loops using cycles or define what a cycle as used here is.
Should be now fixed, let me know if the version after https://reviews.llvm.org/D65299 still needs polishing.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65164/new/

https://reviews.llvm.org/D65164