[PATCH] D103317: [Analyzer][engine][solver] Simplify complex constraints

Mon Jun 7 12:29:01 PDT 2021

vsavchenko added a comment.

In D103317#2803238 <https://reviews.llvm.org/D103317#2803238>, @ASDenysPetrov wrote:

> @vsavchenko
>
> I appologize for my immaturity and for the audacity of my infantile assumptions. I admit any constructive criticism. Thank you for referencing the theory. I will thoroughly study the domain in all.

No need to apologize!  I do like your enthusiasm, great things come from it.  But as I said, I don't want you to waste your own time pursuing things that won't work.

> As for the **assignments** and why I brought them up, that's because I just tried to look at the problem from another angle. Consider two snippets:

OK, let me annotate the code, so we can hopefully close this topic once and for all.

>   void f(int a, int b, int c, int x, int y){

At this point we have 5 memory regions `a`, `b`, `c`, `x` and `y`.  They represent the memory cells allocated for the corresponding parameters on stack.  When we call function, these actually have some values, but we don't know them due to the limitations of the analysis.  For this reason, we say that they contain **symbolic values**. So, let's simply enumerate that so that there is no misunderstanding. Values: #1, #2, #3, #4, and #5. These are not real numbers, we enumerate some unknown values.  As I said, we introduced them as initial values for parameters, so our store (mapping from memory to values) looks likes this: `a -> #1, b -> #2, c -> #3, x -> #4, y -> #5`.  We don't know anything about these values at this point.

>   a = b;

What do we know at this point? Do we get more information about the value that was stored in `a` before? No, we overwrote this value entirely, and what changed here is the mapping: `a -> #2, b -> #2, c -> #3, x -> #4, y -> #5` because `b` contained value `#2`.  Symbolic value `#1` is gone, it's dead.  No-one will ever see or use it again.

>   x = y;

Using the same logic: `a -> #2, b -> #2, c -> #3, x -> #5, y -> #5`.

>   c = a;

And again: `a -> #2, b -> #2, c -> #2, x -> #5, y -> #5`.

>     // What can you say about a, x, c?
>     // `a` equals to `b` and 'c'
>     // `x` equals to `y`
>     // `c` equals to `a` and `b`
>   }

Yes, they are, but we don't need constraints for that.  We know which values are associated with each memory region and we can compare them.  That's all, this simple.  That's why assignments and initializations DON'T MATTER for the solver, these are simply STORE of some symbolic value into memory.  Solver cares only about symbolic values, memory is irrelevant.

> and
>
>   void f(int a, int b, int c, int x, int y){

As before: `a -> #1, b -> #2, c -> #3, x -> #4, y -> #5`.

>   if(a == b) {

Now, the value that we imagine to be in memory region `a` (namely `#1`) is NOT GONE!  It is still there, it is being used.  And what we know after this `if`, is that on one of the paths after it `#1 == #2` and on the other path `#1 != #2`.  These are just values that we know nothing about, but we started to gain some knowledge.
So, the store is the same (the values stored in those memory cell are the same as they were before): `a -> #1, b -> #2, c -> #3, x -> #4, y -> #5`, but now we add constraint `#1 == #2`.

>   if(x == y) {

The same here, store is the same, but constraints are now: `#1 == #2` and `#4 == #5`.

>   if(c == a) {

`#1 == #2 == #3` and `#4 == #5`

>           // What can you say about a, x, c?
>           // `a` equals to `b` and 'c'
>           // `x` equals to `y`
>           // `c` equals to `a` and `b`
>         }
>       }
>     }
>   }

We can actually figure out that `#2 == #3` because of equality theory. We only knew `#1 == #2` and that `#3 == #1`, but equality is a transitive binary relation and we reflect that notion in our equivalence classes data-structure.

> They both generates the same relationship between the variables. **Assignment** seemed to me as just a special case in a //true// branch of `==` operator and can be substituted with it in calculations. So, I decided to make some assumptions in this way. Digging more, it opened a lot of gaps and lacks for me.

Relationship between variables are the same, but neither solver nor the analyzer reason about things in terms of variables.  It's memory and values that the memory carries.  In the first example, we have only 2 values stored in 5 memory locations.  In the second example, it's 5 values stored in 5 memory locations.  On some of the paths we happen to know how those values are related (which is not true for other paths), that's constraints.

I hope now it's clear.

> Next time I will investigate deeper to present a finalized solution and not wasting your time explaining me the basics.

We probably should not have discussions like these because they are not directly about the patch on review.  If you want to continue this discussion, let's bring it to **[cfe-dev]**.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103317/new/

https://reviews.llvm.org/D103317