[PATCH] D125931: [clang][dataflow] Add support for correlated branches to optional model

Fri May 20 06:16:32 PDT 2022

ymandel added a comment.

In D125931#3525699 <https://reviews.llvm.org/D125931#3525699>, @xazax.hun wrote:

> Actually, I think in most cases we want to be consistent on how to merge bool values. So I wonder whether instead of reimplementing the merge operation in this check we should just call a function that does the work. And the same function should be used within the engine to merge states after `if` statements and so on.

Agreed. I think we made a mistake with bools, to be honest, in supporting correlated conditions, given our lack of distinction between join and widen -- something I didn't appreciate fully at the time of the patch.  I propose:

1. In this patch, we go with a widening operation, but put the relevant logic in the core, so it can be reused for booleans in general.
2. Follow up with a patch that uses the new logic for general booleans.

================
Comment at: clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp:596-598
+      MergedEnv.makeOr(
+          MergedEnv.makeAnd(Env1.getFlowConditionToken(), *HasValueVal1),
+          MergedEnv.makeAnd(Env2.getFlowConditionToken(), *HasValueVal2)));
----------------
xazax.hun wrote:
> ymandel wrote:
> > What's the plan for loops? This will increase the size of the constraint on every iteration, preventing termination.
> I have nothing to add, just a brain dump. In this particular case, the analysis domain has infinite chains, the size of the Boolean formulas can increase indefinitely (at least syntactically). The canonical solution to deal with infinite chains is to do a widening operation such that the widened state over approximates both the state of the old and the new iteration. A very simple way of doing this would be to simply erase all knowledge we accumulated about the Boolean value. On the other hand, I wonder if it is often possible to do something smarter. There might be cases when the value is syntactically larger in the new iteration but actually it has the exact same truth table. The solver might be able to show this. On the other hand, if the loop generates new Boolean symbols in each iteration, we have little hope fore that.
> I have nothing to add, just a brain dump. In this particular case, the analysis domain has infinite chains, the size of the Boolean formulas can increase indefinitely (at least syntactically). The canonical solution to deal with infinite chains is to do a widening operation such that the widened state over approximates both the state of the old and the new iteration. A very simple way of doing this would be to simply erase all knowledge we accumulated about the Boolean value. On the other hand, I wonder if it is often possible to do something smarter. There might be cases when the value is syntactically larger in the new iteration but actually it has the exact same truth table. The solver might be able to show this. On the other hand, if the loop generates new Boolean symbols in each iteration, we have little hope fore that.

Agreed. In our internal implementation, we have something like this -- it consults the solver to see if the two are logically equivalent.  That loses the (subtle) support for correlated conditions. However, until we have more precise support for widening (that is, separate from `merge`), I think this is the only way to go to ensure that loop analysis terminates.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D125931/new/

https://reviews.llvm.org/D125931