[PATCH] D153058: [clang][CFG] Support construction of a weak topological ordering of the CFG.

Mon Jul 17 11:22:18 PDT 2023

ymandel added a comment.

Thanks for the very helpful feedback! 
I think I've addressed all of the suggestions/concerns. I suspect there's yet more room for improvement in the algorithm, but I think we're converging on "good enough".

================
Comment at: clang/include/clang/Analysis/Analyses/IntervalPartition.h:38
+// interval.
 struct CFGInterval {
+  CFGInterval() = default;
----------------
sammccall wrote:
> Summarizing an offline discussion, IIUC...
> - we build a tree of intervals that represents the whole sequence of interval graphs
> - for the current worklist algorithm, we use only the total order and process the first invalid block, the tree is not needed
> - the algorithm/data structure could be significantly simplified (no variants, templates, recursion...) if it didn't build the tree, just the interval graphs iteratively until a single node is reached
> - however a different, non-worklist algorithm could use the tree information to determine which blocks to visit next. (Given `a (b c d) e`, after `d` we try only `b` or `e`). This avoids expensive "is invalid" checks
> - so the motivation for preserving the full interval tree is future-proofing for using a different algorithm
> 
> I think it's worth documenting this motivation, and also considering not paying this complexity cost for a feature we aren't adding yet.
Sam, I've radically simplified the implementation to only build exactly what's needed -- the ordering. PTAL and let me know what further comments/documentation would help.

================
Comment at: clang/include/clang/Analysis/Analyses/IntervalPartition.h:56
+
+  template <typename Node> struct NodeData {
+    // The block from which the interval was constructed. It is either the
----------------
sammccall wrote:
> xazax.hun wrote:
> > Correct me if I'm wrong but I have the impression that `CFGInterval` might either contain only `CfgBlock`s or other `CfgInterval`s, but these might never be mixed. The current representation does allow such mixing. In case do not need that, I think it might be cleaner to make `CfgInterval` itself templated and remove the `std::variant`.
> (Very optional idea, in case you like it)
> 
> This data structure is complicated: a tree of Intervals with CFGBlocks at the root, and at every level the Intervals have pred/succ edges that form part of one of the intermediate flow graphs.
> 
> It seems nice to get rid of the tree if possible (ultimately we're using these for ordering) and the nested pred/succs - we seem to only need these for the top level interval (and in the end, we have one interval and pred/succs are empty.
> 
> What if we made this a flat structure, preserving the bracket structure as some kind of back-edge?
> 
> ```
> 1  (  2  (  3  4  )  )  5
> becomes
> 1     2     3  4 !3 !2  5
> ```
> 
> AIUI, the dataflow algorithm can basically treat this as a program, we move left to right and `!X` means "if (X is invalidated) goto X".
> 
> This would give us something like:
> ```
> using Entry = PointerIntPair<CFGBlock*, 1>; // either an actual block, or a conditional jump
> struct Interval { vector<Entry> Sequence; SmallDenseSet<Interval*> Preds, Succs; }
> using IntervalGraph = vector<Interval>;
> ```
> 
> I also suspect that this would let us avoid the core algorithms (buildInterval, addIntervalNode) being templated, simply wrap each CFGBlock in a trivial Interval to start with instead. I guess this is OK performance-wise, but the reason not to do it is ending up with those nodes polluting your final data structure?
> 
> 
> Anyway, all this is up to you - for me keeping more data/structure around than we're going to use is confusing, but it does seem closer to the textbook.
I went with this proposal and stripped it down to only track what we need. I didn't even preserve the backedge structure, because we can't use that yet. If and when we get there, I think you're approach or something similar, where we avoid the tree structure, is a great idea.

================
Comment at: clang/include/clang/Analysis/Analyses/IntervalPartition.h:56-64
+  template <typename Node> struct NodeData {
+    // The block from which the interval was constructed. It is either the
+    // graph's entry block or has at least one predecessor outside the interval.
+    const Node *Header;
+    std::vector<const Node *> Nodes;
+  };
+  using IntervalData = std::variant<NodeData<CFGBlock>, NodeData<CFGInterval>>;
----------------
ymandel wrote:
> sammccall wrote:
> > xazax.hun wrote:
> > > Correct me if I'm wrong but I have the impression that `CFGInterval` might either contain only `CfgBlock`s or other `CfgInterval`s, but these might never be mixed. The current representation does allow such mixing. In case do not need that, I think it might be cleaner to make `CfgInterval` itself templated and remove the `std::variant`.
> > (Very optional idea, in case you like it)
> > 
> > This data structure is complicated: a tree of Intervals with CFGBlocks at the root, and at every level the Intervals have pred/succ edges that form part of one of the intermediate flow graphs.
> > 
> > It seems nice to get rid of the tree if possible (ultimately we're using these for ordering) and the nested pred/succs - we seem to only need these for the top level interval (and in the end, we have one interval and pred/succs are empty.
> > 
> > What if we made this a flat structure, preserving the bracket structure as some kind of back-edge?
> > 
> > ```
> > 1  (  2  (  3  4  )  )  5
> > becomes
> > 1     2     3  4 !3 !2  5
> > ```
> > 
> > AIUI, the dataflow algorithm can basically treat this as a program, we move left to right and `!X` means "if (X is invalidated) goto X".
> > 
> > This would give us something like:
> > ```
> > using Entry = PointerIntPair<CFGBlock*, 1>; // either an actual block, or a conditional jump
> > struct Interval { vector<Entry> Sequence; SmallDenseSet<Interval*> Preds, Succs; }
> > using IntervalGraph = vector<Interval>;
> > ```
> > 
> > I also suspect that this would let us avoid the core algorithms (buildInterval, addIntervalNode) being templated, simply wrap each CFGBlock in a trivial Interval to start with instead. I guess this is OK performance-wise, but the reason not to do it is ending up with those nodes polluting your final data structure?
> > 
> > 
> > Anyway, all this is up to you - for me keeping more data/structure around than we're going to use is confusing, but it does seem closer to the textbook.
> I went with this proposal and stripped it down to only track what we need. I didn't even preserve the backedge structure, because we can't use that yet. If and when we get there, I think you're approach or something similar, where we avoid the tree structure, is a great idea.
True, but the `Header` field gets in the way, since it needs to have a recursive type (it's type is always one level down from the interval's type).  So, somewhere you'll need a variant and a recursive type. The current version avoids this by dropping `Header` entirely, since it isn't needed in the output.

================
Comment at: clang/include/clang/Analysis/Analyses/IntervalPartition.h:101-103
+/// groups topologically. As a result, the blocks in a series of loops are
+/// ordered such that all nodes in loop `i` are earlier in the order than nodes
+/// in loop `j`. This ordering, when combined with widening, bounds the number
----------------
xazax.hun wrote:
> I wonder if we still want to somehow enforce that within a loop/interval the we visit nodes in the RPO order.
Starting with RPO should make this algorithm cheaper (and maybe simpler!).  But, RPO construction isn't free. So, I think it is a matter of cost -- compare generating the RPO first and then using that here vs. just using this directly.  I can put a note in the implementation as FIXME. WDYT?

================
Comment at: clang/include/clang/Analysis/Analyses/IntervalPartition.h:123
+/// This WTO construction is described in Section 4.2 of [Bourdoncle1993].
+std::optional<WeakTopologicalOrdering> getIntervalWTO(const CFG &Cfg);
+
----------------
sammccall wrote:
> Naming is confusing here: `getWTO` is the internal function whose API involves intervals, and `getIntervalWTO` be the main function whose API doesn't. I think literally swapping them might be clearer!
> 
> in general it's hard to tell which parts of this API people are actually meant to use.
> Is it reasonable to wrap everything up to `WeakTopologicalOrdering` in `namespace internal` or so? Or are there places this will be used outside tests?
Agreed.  The only reason to  expose is for tests.

================
Comment at: clang/lib/Analysis/IntervalPartition.cpp:55
+  for (const Node *S : Header->succs())
     if (S != nullptr)
+      if (auto SID = getID(S); !Partitioned.test(SID)) {
----------------
sammccall wrote:
> I'm confused, here and below why are successors allowed to be nullptr?
That's how the CFG encodes unreachable nodes. `succs` doesn't actually return CFGBlocks, but an intermediate type which encodes reachability info. When you (implicitly) cast it to CFGBlock*, unreachable nodes become a nullptr.

================
Comment at: clang/lib/Analysis/IntervalPartition.cpp:121
+    Index.emplace(N, ID);
+  Graph.Intervals.emplace_back(ID, Header, std::move(Data.Nodes));
 }
----------------
xazax.hun wrote:
> ymandel wrote:
> > xazax.hun wrote:
> > > It would probably take for me some time to understand what is going on here, but I think this part might worth a comment. In particular, I have the impression that `Graph.Intervals` is the owner of all the intervals. But we use pointers to refer to these intervals. What would guarantee that those pointers are not being invalidated by this `emplace_back` operations?
> > Excellent question, not least because I *wasn't* careful the first around and indeed ran up against this as a memory bug. :) That's the reason I added IDs, so that I could _avoid_ taking the addresses of elements until after we finish growing the vector. See lines 148-165: after we finish building the intervals, we go through and take addresses.
> > 
> > I can add comments to this effect, but perhaps the safe option is to use a std::dequeue. What do you think?
> Oh, thanks! I don't have a strong preference between a comment or a deque. 
I ended up changing to a deque after almost making the same mistake again. vector is just not safe when you want to be taking addresses. :) I've dropped the use of IDs and added comments. Let me know if this clear enough.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153058/new/

https://reviews.llvm.org/D153058