[clang] [analyzer] Don't assume third iteration in loops (PR #119388)

Fri Dec 13 07:14:25 PST 2024

NagyDonat wrote:

Thanks for the review, I'll fix the minor remarks soon.

> I wonder how you found out that loop widening may conflict with this heuristic, were any tests broken that raised your awareness?

Yes, some (IIRC three) testcases fail in `loop-widening.c`.

> I have one concern with `didEagerlyAssumeBifurcateAt`, that it is bound and overwritten at the most recent eager split - which to my knowledge may not be the loop condition if we have something like this:
> 
> ```c++
> while (coin()) {
>   if (coin()) { // #inner_coin
>     ++num;
>   }
> }
> ```
> 
> Each time the `inner_coin` would spoil the `LastEagerlyAssumeBifurcationAt` trait between the loop iterations, right?
> 
> Storing `const Expr *` pointers is brittle for recursive functions too, where the same expression could appear in different LocationContexts. I wonder if we could make an example where it would confuse this heuristic.
> 
> I'm also not entirely convinced about nested loops - if this heuristic would kick in. This is basically a variant of the `coin` case I showed an example.
> 
> I don't have a counter example just yet, but I'm not sure I can spare the time to craft one.

These situations do not cause problems for my goals, because I only intend to use `LastEagerlyAssumeBifurcationAt` as a very short-term storage between the node where the eager assumption happens and the `processBranch` call directly after it within the same iteration. I did not intend to reuse this information in later iterations of the same loop.

However, it's nice that you highlighted this question, because now I realized that my implementation is buggy (does not do what I wanted). For example in
```cpp
void foo(int x) {
  while (x == 0) {
    // body that does not change x and does not trigger any EagerlyAssume
  }
}
```
EagerlyAssume would succeed at the beginning of the first iteration, and this would set `LastEagerlyAssumeBifurcationAt` to the expression `x == 0`. This would stay in the state during the subsequent iterations, and at the beginning of the third iteration the code that I added would notice its presence and prevent the modeling of the third iteration. (I want to stop continuing in the loop after the second iteration only if the loop condition is ambiguous _at the beginning of that particular iteration_, not after some earlier iteration.)

To fix this problem, it would be sufficient to e.g. ensure that `evalEagerlyAssumeBifurcation` sets `LastEagerlyAssumeBifurcationAt` to `nullptr` when the bifurcation fails (= the value is not ambiguous = there is only one child node). However, there might be better solutions to implement this "did EagerlyAssume split the state _right now_ directly before this `processBranch` callback?" check that I need. (E.g. perhaps we could walk a few steps backwards on the ExplodedGraph -- but I don't know what kinds of nodes should I expect and I'm afraid that what I could write would run into pitfalls in unusual cases.)

`<rant>`From a higher level POV I think that `EagerlyAssume` is an ugly hack which makes the ExplodedGraph harder to understand or reason about. I'm certain that there are (or at least were) some clever corner cases where these eager assumptions provide a better analysis (so I don't hope that we can just disable it), but I'm still annoyed that it exists.`</rant>` 

https://github.com/llvm/llvm-project/pull/119388