[PATCH] D131646: [clang][dataflow] Restructure loops to call widen on back edges

Fri Aug 19 13:09:03 PDT 2022

li.zhe.hua added inline comments.

================
Comment at: clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp:168-169
+// back edge block. (That is, all paths from the entry block to the back edge
+// block must go through `Block`.) It also means that there are only two
+// predecessors; the other is along the path from the entry block to `Block`.
+static const CFGBlock *findBackEdge(const CFGBlock *Block) {
----------------
sgatev wrote:
> li.zhe.hua wrote:
> > xazax.hun wrote:
> > > Is this also true when we have multiple `continue` statements in the loop?
> > Yes. The end of the loop, and each of the `continue` statements, target the back edge block. They all get funneled through that back edge to `Block`, such that `Block` only has two predecessors. However, I haven't verified this in the CFG code, only by not finding a counterexample.
> Does that hold for back edges stemming from `goto` statements?
Ah, `goto` does throw a wrench in all of this. Specifically, it is problematic only for `do ... while` loops, where the labeled statement is the first statement of the `do` block.

I took out the sentence, took out the `assert`, and added a `FIXME` for `prepareBackEdges`.

================
Comment at: clang/lib/Analysis/FlowSensitive/TypeErasedDataflowAnalysis.cpp:228-234
+  // If we are at the start of a loop, we will have two precessors, but we don't
+  // want to join these two predecessors. Instead, we want to take the back edge
+  // block (i.e. the result of the previous iteration) and use that directly as
+  // the input block.
+  //
+  // For the first iteration of loop, the "zeroth" iteration state is set up by
+  // `prepareBackEdges`.
----------------
xazax.hun wrote:
> xazax.hun wrote:
> > li.zhe.hua wrote:
> > > xazax.hun wrote:
> > > > Could you elaborate on this? Let's consider this loop:
> > > > 
> > > > ```
> > > >  Pred
> > > >    |
> > > >    v
> > > >  LoopHeader <---BackEdge
> > > > ```
> > > > 
> > > > Do we ignore the state coming from `Pred` on purpose? Is that sound?
> > > > 
> > > > I would expect the analysis to always compute `join(PredState, BackEdgeState)`, and I would expect the widening to happen between the previous iteration of `BackEdgeState` and the current iteration of `BackEdgeState`. So, I wonder if we already invoke the widening operator along back edges, wouldn't the regular logic work just fine? Do I miss something?
> > > > 
> > > > Do we ignore the state coming from `Pred` on purpose? Is that sound?
> > > 
> > > We don't, and this is what the comment
> > > 
> > > ```
> > >   // For the first iteration of loop, the "zeroth" iteration state is set up by
> > >   // `prepareBackEdges`.
> > > ```
> > > failed to explain. After transferring `PredState`, we copy `PredState` into `BackEdgeState`, which is done in `prepareBackEdges`.
> > > 
> > > > I would expect the analysis to always compute `join(PredState, BackEdgeState)`
> > > 
> > > I'm not sure I see that we should always join `PredState` and `BackEdgeState`. Execution doesn't go from `Pred` into the Nth iteration of the loop, it only goes from `Pred` into the first iteration of the loop, e.g. the predecessor for the 4th iteration of the loop is only the back-edge from the 3rd iteration of the loop, not `Pred`.
> > > 
> > > Let me know if this makes sense.
> > > I'm not sure I see that we should always join `PredState` and `BackEdgeState`. Execution doesn't go from `Pred` into the Nth iteration of the loop, it only goes from `Pred` into the first iteration of the loop, e.g. the predecessor for the 4th iteration of the loop is only the back-edge from the 3rd iteration of the loop, not `Pred`.
> > > 
> > > Let me know if this makes sense.
> > 
> > The analysis state of the dataflow analysis supposed to overestimate all of the paths. Consider the following loop and imagine we want to calculate intervals for integer variables:
> > 
> > ```
> > int i = 0;
> > while (...) {
> >   [[program point A]];
> >   ++i;
> > }
> > ```
> > 
> > During the analysis of this loop, the value `i ==0` flows to `[[program point A]]`. This is the motivation to join the state from the back edge and from PredState. As far as I understand, you want to achieve this by copying PredState to the BackEdgeState before the first iteration. But this also means that we will use the widening operator to combine PredState with the state after N iterations instead of the regular join operation. I am not entirely sure whether these two approaches always produce the same results. 
> > 
> > 
> A contrived example (just came up with this in a couple minutes so feel free to double check if it is OK):
> Consider:
> ```
> int i = 0;
> while (...) {
>   if (i == 0)
>     i = -2;
>   ++i;
> }
> ```
> 
> Some states:
> ```
> PredState = i : [0, 0]
> BackEdgeState (after first iteration) = i : [-1, -1]
> ```
> 
> And the results of join vs widen:
> ```
> PredState.join(BackEdgeState) = i : [-1, 0]
> PredState.widen(BackEdge) = i : [-inf, 0]
> ```
> 
> The extra widening with the PredState can result in extra imprecision. 
So, my understanding is that widening introduces imprecision as a trade-off for convergence. Yes, in the contrived example, joining converges after a few iterations, but in other cases it never converges.

---

Looking at the Rival and Yi book, there are no intervening joins with the predecessor for the first loop as we analyze the loop. This is roughly where I drew from in terms of this implementation.

```
analysis(iter{p}, a) = { R <- a;
                         repeat
                             T <- R;
                             R <- widen(R, analysis(p, R));
                         until inclusion(R, T)
                         return T;
```

================
Comment at: clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp:270
+
+TEST(DataflowAnalysisTest, ConvergesOnWidenAnalysis) {
+  std::string Code = R"(
----------------
sgatev wrote:
> There's already a set of "widening" tests - http://google3/third_party/llvm/llvm-project/clang/unittests/Analysis/FlowSensitive/TypeErasedDataflowAnalysisTest.cpp;l=527-712;rcl=462638952
> 
> What do you think about refactoring those so that we have tests that exercise the framework with both `join` and `widen`?
This commit implements widening only for the lattice. Those tests use the `NoopLattice`, so their behavior is not expected to change with this commit. Splitting `merge` into a `join` and `widen` is out-of-scope for this commit, and is what I am hoping to work on next.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D131646/new/

https://reviews.llvm.org/D131646