[PATCH] D78933: [analyzer] RangeConstraintManager optimizations in comparison expressions

Denys Petrov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 6 10:13:31 PDT 2020


ASDenysPetrov marked 3 inline comments as done.
ASDenysPetrov added a comment.

> You can either add -analyzer-stats to your cc1 invocation

It helps, thanks.



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:588
+  // AnyX2 means that two expressions marked as `Any` are met in code,
+  // and there is a special column for that, for example:
+  // if (x >= y)
----------------
xazax.hun wrote:
> xazax.hun wrote:
> > ASDenysPetrov wrote:
> > > xazax.hun wrote:
> > > > I have really hard time processing how to interpret `AnyX2`.
> > > > 
> > > > For example in the code below:
> > > > ```
> > > >    if (x >= y)
> > > >      if (x != y)
> > > >        if (x <= y)
> > > >          return false
> > > > ```
> > > > 
> > > > ```
> > > >    if (x >= y)
> > > >      if (x == y)
> > > >        if (x <= y)
> > > >          return true
> > > > ```
> > > > 
> > > > We would get different results for `<=`. So I do not really get how I should read the `AnyX2` column.
> > > I'm sorry for the obscure explanation. I was really thinking about how to explain it more clear.
> > > Let's consider this code:
> > > ```
> > > if (x >= y)
> > >   if (x != y)
> > >     if (x <= y)
> > > ```
> > > If 'x >= y' is //true//, then following `x <= y` is  `Any` ( can be //true// or //false// ). //See table in the summary.//
> > > If 'x != y' is //true//, then following `x <= y` is  `Any` ( can be //true// or //false// ).
> > > If 'x >= y && x != y' is //true//, then following `x <= y` is `False` only (it means that we previously met two Any states, what I've called as **AnyX2**).
> > > 
> > > 
> > > 
> > > 
> > Could you tell me how the reasoning would be different for 
> > ```
> > if (x >= y)
> >   if (x == y)
> >     if (x <= y)
> > ```
> > ?
> Oh never mind, after `x == y` the `x <= y` would be `true`. But still, I find this logic hard to explain, understand, and prove correct.
> I'd like to see at least some informal reasoning why will it always work. Also, if someone would like to extend this table in the future (e.g. with operator <=>), how to make it in such a way that does not break this logic.
When I was creating a table it was just 6x6 without AnyX2 column. It should tell of what RangeSet we shall return (true, false or empty). It covers cases with only **one** preceding condition.
Then I found that **two** certain sequential conditions can also tell us what happen to the next condition. And those two conditions happened to be described as `Any` in the table. Thus I deduced one more column for this purpose.
In other words:
```
if (x >= y)     // 3. Aha, we found `x >= y`, which is also `Any` for `x <= y` in the table
  if (x != y)   // 2. Aha, we found `x == y`, which is `Any` for `x <= y` in the table
    if (x <= y) // 1. Reasoning a RangeSet about this `x <= y`
    // 4. OK, we found two `Any` states. Let's look at the value in `AnyX2` column for `x <= y`.
    // 5. The value is `False`, so return `FalseRangeSet`, assuming CSA builds false branch only.
```
You can substitute any other row from the table and corresponding Any-comparisons.
Honestly, I don’t know how else to explain this. :)
>I find this logic hard to explain, understand, and prove correct.
If you come up with it, It becomes really easy and clear. You'll see. But I agree that AnyX2 can be changed to something more obvious.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:622
+  auto IndexOP = IndexFromOp(OP);
+  auto LHS = SSE->getLHS();
+  auto RHS = SSE->getRHS();
----------------
xazax.hun wrote:
> ASDenysPetrov wrote:
> > xazax.hun wrote:
> > > Could `LHS` and `RHS` be other expressions? Does it make sense to continue executing this function if one of them is not a simple symbol?
> > I think it makes sense, because `(x+3*y) > z` and `z < (x+3*y)` perfectly fits in the logic.
> Makes sense,  thanks. Maybe it would be worth to rephrase the comment to note that `x` and `y` can also stand for subexpressions, not only for actual symbols.
Yes! Great! 


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:635
+
+    // Let's find an expression e.g. (x < y).
+    OP = OpFromIndex(i);
----------------
xazax.hun wrote:
> ASDenysPetrov wrote:
> > xazax.hun wrote:
> > > I am not really familiar with how constraints are represented but I vaguely recall the analyzer sometimes normalizing some expressions like conversion `A == B` to `A - B == 0`. I am just wondering if this API to look this expression up is not the right abstraction as it might be better to handle such normalizations in a unified, central way.
> > > 
> > > Also, note that this method does not handle transitivity. I wonder if maintaining set of ranges is the right representation for this information at all. The ordering between the symbols will define a lattice. Representing that lattice directly instead of using ranges might be more efficient.
> > >conversion A == B to A - B == 0. 
> > All my samples `x == y` worked as `x == y`. I havn't seen such transformations.
> > >Representing that lattice directly instead of using ranges might be more efficient.
> > How do you imagine that lattice? Could you detail it?
> A lattice is basically a DAG, where the nodes are the values (symbols, or subexpressions), and the edges are ordering. So it defines an ordering. E.g., A is smaller than B, if and only if there is a path from A to B. I am not sure whether this is the right representation, but it would take care of transitivity. 
I will inspect this idea. Thanks.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78933/new/

https://reviews.llvm.org/D78933





More information about the cfe-commits mailing list