[PATCH] D122838: [clang][dataflow] Add support for correlation of boolean (tracked) values

Fri Apr 1 04:28:37 PDT 2022

ymandel added inline comments.

================
Comment at: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:79
+    for (BoolValue *Constraint : Env1.getFlowConditionConstraints()) {
+      Expr1 = &Env1.makeAnd(*Expr1, *Constraint);
+    }
----------------
sgatev wrote:
> xazax.hun wrote:
> > Hmm, interesting.
> > I think we view every boolean formula at a certain program point implicitly as `FlowConditionAtThatPoint && Formula`. And the flow condition at a program point should already be a disjunction of its predecessors.
> > 
> > So it would be interpreted as: `(FlowConditionPred1 || FlowConditionPred2) && (FormulaAtPred1 || FormulaAtPred2)`.
> > While this is great, this is not the strongest condition we could derive. 
> > `(FlowConditionPred1 && FormulaAtPred1)  || (FormulaAtPred2 && FlowConditionPred2)` created by this code snippet is stronger which is great.
> > 
> > My main concern is whether we would end up seeing an exponential explosion in the size of these formulas in the number of branches following each other in a sequence.
> > 
> Yeap, I agree this is suboptimal and I believe I'm the one to blame for introducing it downstream.
> 
> I wonder if we can represent the flow condition of each environment using a bool atom and have a mapping of bi-conditionals between flow condition atoms and flow condition constraints. Something like:
> 
> ```
> FC1 <=> C1 ^ C2
> FC2 <=> C2 ^ C3 ^ C4
> FC3 <=> (FC1 v FC2) ^ C5
> ... 
> ```
> 
> We can use that to simplify the formulas here and in `joinConstraints`. The mapping can be stored in `DataflowAnalysisContext`. We can track dependencies between flow conditions (e.g. above `FC3` depends on `FC1` and `FC2`) and modify `flowConditionImplies` to construct a formula that includes the bi-conditionals for all flow condition atoms in the transitive set before invoking the solver.
> 
> I suggest putting the optimization in its own patch. I'd love to look into it right after this patch is submitted if both of you think it makes sense on a high level.
This sounds good to me. That said, I'm not sure how often we'd expect this to be an issue in practice, since, IIUC, this specialized merge only occurs when the value is handled differently in the two branches. So, a series of branches alone won't trigger the exponential behavior.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122838/new/

https://reviews.llvm.org/D122838