[PATCH] D158407: #llvm #flow-analysis Simplify formula at CNF construction time, and short-cut solving of known contradictory formulas.

Thu Aug 24 09:27:01 PDT 2023

sammccall added a comment.

This simplification makes sense, but if we're adding the layer, we're missing an opportunity to apply it completely.
(Intuitively, I don't see any reason that two passes is "enough")
To do this, we'd want to simplify existing formulas when we learn something new.

Curious whether you think this is worth doing and why - I don't think I'm opposed to the current version if it solves the practical problem, but I am concerned we might be adding simplification layers that solve some test cases but are defeated by relatively simple changes like reordering inputs.

---

For a complete version, I think the algorithm we'd want is something like:

  clauses_so_far = {}
  clauses_to_add = [...]

  for clause in clauses_to_add:
    clause = simplify(clause)
    if clause is trivially true or clause in clauses_so_far:
      continue # adds nothing
    if clause is {lit}:
      # reprocess clauses containing the variable we just resolved
      affected_clauses = [c in clauses_so_far if lit in c or negate(lit) in c]
      clauses_so_far -= affected_clauses
      clauses_to_add += affected_clauses
    clauses_so_far.add(clause)
    if clause is trivially false:
      break

  def simplify(clause):
    if clause.any(lit => {lit} in clauses_so_far):
      return trivially true 
    return clause.filter(lit => not {negate(lit)} in clauses_so_far)    

So the builder data structure (clauses_so_far) needs to support:

- test for clause presence
- remove and return clauses containing a literal

I can't come up with anything particularly clever, but throwing some hashtables at it always works...

(A more powerful version would recognize that AvB can simplify AvBvC and so do removal by subset rather than a single literal, but that seems even further beyond scope)

================
Comment at: clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp:179

+/// Applies simplifications while building up a BooleanFormula.
+struct CNFFormulaBuilder {
----------------
It would be useful to briefly describe what kind of simplifications. (this drives e.g. the fact we use it twice).

AIUI:
```
It tracks variables that must be true/false, based on trivial clauses seen so far.
Such variables can be dropped from subsequently-added clauses, which
may render such clauses trivial and give us more known variables.
```

================
Comment at: clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp:195
+  void addClause(Literal L1, Literal L2 = NullLit, Literal L3 = NullLit) {
+    if (Formula.KnownContradictory)
+      return;
----------------
you're already testing this in the higher-level loop, so checking on every call to addClause doesn't seem to actually save anything significant.

IMO it makes responsibilities less clear, so I'd prefer to drop it here and addClauseLiterals

================
Comment at: clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp:203
+
+    llvm::SmallVector<Literal, 3> literals;
+    literals.push_back(L1);
----------------
nit: capitalize variable names here & elsewhere

================
Comment at: clang/lib/Analysis/FlowSensitive/WatchedLiteralsSolver.cpp:426
+  // considered for the simplification of earlier clauses. Do a final
+  // pass to find more opportunities for simplification.
+  CNFFormula FinalCNF(NextVar - 1, std::move(CNF.Atomics));
----------------
the issue is that info only propagates forward (earlier to later clauses, right?)

so by running this again, and sorting units first, we allow simplifications that propagate info backwards *once*, but we still don't have all simplifications.

```
D
Av!B
Bv!C
Cv!D

// first simplification pass

Av!B
Bv!C
C // hoist new unit

// second simplification pass

Av!B
B // hoist new unit

// third simplification pass
A
```

I think this is worth being explicit about: we're going to find some more simplifications, but we won't find them all, because running this to fixed point is too inefficient.

Is 2 experimentally determined to be the right number of passes? a guess? or am I misunderstanding :-)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D158407/new/

https://reviews.llvm.org/D158407