[PATCH] D102696: [Analyzer] Find constraints that are directly attached to a BinOp

Fri May 28 07:12:04 PDT 2021

vsavchenko added a comment.

Oh, wow. Great plan! I'd like to participate. 😊
I have a few comments:

> Alright, I see your point. I agree that solving the problem of "$a +$b +$c
> constrained and then $a + $c got constrained" requires canonicalization.

That was actually not an example for canonicalization, but rather that `ParentMap` will exponentially explode due to the number of combinations of sub-expressions if we want to make it useful for all cases.
If we do want to do something with situations like this `ParentMap` should be re-thought.

> 1. Update `setConstraint` to simplify existing constraints (and adding the
>
> simplified constraint) when a new constraint is added. In this step we'd just
> simply iterate over all existing constraints and would try to simplfy them with
> `simplifySVal`

I like this one!  So, to be more specific: If we add a constraint `Sym -> C`, where `C` is a constant, we can try to simplify other constrained symbols, where `Sym` is a sub-expression.
I want to add that now (because of me 😅) we do not store constraints as a map `Sym -> RangeSet`, it's a bit trickier.
We have a few mappings: `Sym -> EquivalenceClass` (member-to-class relationship), `EquivalenceClass -> RangeSet`, `EquivalenceClass -> {Sym}` (class-to-members relationship), and `EquivalenceClass -> {EquivalenceClass}` (to track disequalities). 
It means that we will need to replace `Sym` with its simplified version in `Sym -> EquivalenceClass` and `EquivalenceClass -> {Sym}`.
Not that it is impossible or something, but it makes it a bit harder as it has more moving pieces.

> 2. Add the capability to simplify more complex constraints, where there are 3
>
> symbols in the tree. In this change I suggest the extension and overhaul of
> `simplifySVal`. This change would make the following cases to pass (notice the
> 3rd check in each test-case).

This sounds great, but we we do need to think about performance, so we should discuss how do we plan to implement such an extension.

> 3. Add canonical trees to the solver. The way we should do this is to build
>
> "flat" sub-trees of associative operands. E.g. `b + a + c` becomes `+(a, b, c)`,
> i.e. a tree with a root and 3 children [1]. The ordering of the children
> could be done by their addresses (`Stmt *`) and we could simply store them in an
> array. Probably we'll need a new mapping: SymbolRef -> CanonicalTree.

I think that flattening is a good idea (and it is widespread in other solvers), but "flat" sub-tree (aka non-binary tree).
However, it is useful ONLY if we can efficiently support `isSubsetOf` predicate (specifically) for cases like `+(a, b, c)` and `+(a, c)`.

I don't think that `Sym -> Canonical` is a good idea.  The main goal of canonicalization is to compact potentially exponential number of equivalent ways of spelling the same symbolic expression and have a single representation for it.
So, it means that we expect that `Canonicalization(Sym)` will be called for multiple `Sym`s producing identical results.  This way `Sym -> Canonical` is simply a cache for `Canonicalization` (i.e. maybe shouldn't even be associated with a state).

> Then, we check each existing constraints' canonical tree to check
> whether the newly constrained expression is a sub-expression of that.

I'm worried about performance implications of this.

> 4. Extend 3) with `-` and `/`. `lhs - rhs` becomes a `lhs + (-rhs)` and `lhs /rhs`
>
> is substituted wit `lhs * (1/rhs)`.

Is it true for bit-vectors though?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102696/new/

https://reviews.llvm.org/D102696