[PATCH] D103317: [Analyzer][engine][solver] Simplify complex constraints

Valeriy Savchenko via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 2 07:46:38 PDT 2021


vsavchenko added a comment.

In D103317#2793557 <https://reviews.llvm.org/D103317#2793557>, @ASDenysPetrov wrote:

> Returning to the discussion raised in D102696 <https://reviews.llvm.org/D102696>, I'd like to share my vision.
> I think we can use much easier approach to use valid constraints at any point of time.
> The main idea is lazy-reasoning of the ranges.
> This approach:
>
> - **doesn't** need to use **canonicalization**;
> - **doesn't** need to **update all** the constraints on **each** `setConstraint` call (using brute-force or any other structure traversals);
> - **add** an additional bindings for **every** simple operand on **each** //assignment// or //initialization//.
> - **remove** an invalid bindings for **every** simple operand on **each** //assignment// or //initialization//.
> - **recursively** get the **range** when and only when needed (lazy-reasoning).
>
> Example:
>
>   a = b + 1;
>   c = a - 5;
>   if (b != 42) return 0;
>   if(c == 38) return 1;
>   return 0;
>
> `a = b + 1;`
>
> 1. LHS `a` lookup. LHS not found.
> 2. Add binding `$a = $b + 1`
> 3. Traverse through RHS `b + 1`.
> 4. `b` lookup. `b` not found.
> 5. Built a new expression `a - 1` for `b`.
> 6. Add binding for `b`: `$b = $a - 1`
> 7. `1` is a constant. Skip.
>
> `c = a - 5;`
>
> 1. LHS `c` lookup. LHS not found.
> 2. Add binding `$c = $a - 5`
> 3. Traverse through RHS `a - 5`.
> 4. `a` lookup. `a` found.
> 5. If LHS not found and `a` found, then skip.
> 6. `5` is a constant. Skip.
>
> `if (b != 42) return;`
>
> 1. Get the range of `$b`. Direct range not found. Add `$b` to the list of visited symbols.
> 2. Substitute bindings. From `$b != 42` to `$a - 1 == 42`.
> 3. Traverse through `$a - 1`.
> 4. Get the range of `$a`. Direct range not found. Add `$a` to the list of visited symbols.
> 5. `1` is a constant. Skip.
> 6. Substitute bindings. From `$a - 1 == 42` to `$b + 1 - 1 == 42`.
> 7. Traverse through `$b + 1 - 1`.
> 8. `$b` is in the list of already visited symbols. No way to find a range.
> 9. Create a range from the type [MIN, MAX].
> 10. Perform comparison and produce a constraint for `b` [42, 42].
>
> `if(c == 38) return;`
>
> 1. Get the range of `$c`. Direct range not found. Add `$c` to the list of visited symbols.
> 2. Substitute bindings. From `$c == 38` to `$a - 5 == 38`.
> 3. Traverse through `$a - 5`.
> 4. Get the range of `$a`. Direct range not found. Add `$a` to the list of visited symbols.
> 5. `5` is a constant. Skip.
> 6. Substitute bindings. From `$a - 5 == 38` to `$b + 1 - 5 == 38`.
> 7. Traverse through `$b + 1 - 5`.
> 8. Get the range of `$b`. Direct range **found** [42, 42].
> 9. `1` is a constant. Skip.
> 10. Calculate(simplificate the expression etc.) a new range for `$c` [38, 38].
> 11. Condition is //true//.
>
> `return 1`
>
> P.S. I've just started thinking of the way to solve this problem. For now it's just a sketch of where I'm going.

Great to know that you are also thinking about this problem!
The more the merrier!

I have a couple of notes about your suggestion.
First is minor, `a = b + 1` and `c = a + 5` won't actually produce any constraints. Starting from those statements if you request a symbol for `a` you will get `b + 1` and `b + 6` for `c`.
Of course, you can change your code so that we make an assumption that `a == b + 1` and `c == a + 5` and those would be constraints.  That's why I said that it's minor.

Now, for the biggest concern.  You cover in a very detailed manner all the traversals and caches for visited symbols, but have **Substitute bindings** as a sub-step , which is HUGE and absolutely non-trivial.
In your examples, you have a linear chain of related symbolic expressions, i.e. something like this `x -> y + 2`, `y -> z + 3`, `z -> 4` so that we have `x -> 9` after three substitutions. But the problem it is generally not one-to-one relationship, so `x -> y1 + 1`, `x -> y2 + 2`, ... , `x -> yN + N`.  Imagine that each `yK` is involved with another `M` constraints. So, you would need to check `x -> y1 -> z11`, if it's not working - check `x -> y1 -> z12` and so on.  When you run out of options for `z1K`, you need to backtrack and try `x -> y2 -> z21`, etc.  As you can see, this is BRUTE FORCE.

And it couldn't be easier theoretically. So, instead of trying that many options every time we need to judge about, we take a bit simpler shortcut.  It's far from being complete, but honestly we're not pursuing completeness.  For my example above, whenever we find out that `zKM -> 5`, we simplify everything that involves `zKM` and get `yK -> 10` and get `x -> 15`.  It is far from being perfect, I don't like that we need to go through all symbols and try to simplify every single symbol, and it would be good to optimize this scenario (mapping symbols to all symbolic expressions that contain them, for example), but essentially it is cheaper in the long run.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103317/new/

https://reviews.llvm.org/D103317



More information about the cfe-commits mailing list