[PATCH] D78933: [analyzer] RangeConstraintManager optimizations in comparison expressions

Wed May 6 07:30:18 PDT 2020

ASDenysPetrov marked 5 inline comments as done.
ASDenysPetrov added a comment.

@xazax.hun, many thanks for looking at the patch.

> Did you do some measurements on some projects how the run-time changes?

I have never run them before. How I can do this? What project to use as a sample?

> [1]: https://arxiv.org/pdf/cs/0703084.pdf

I will definitely read it, thank you.

> Also, it would be useful to see how statistics like coverage patterns, number of exploded nodes etc changes.

You can check F11839257: before.html <https://reviews.llvm.org/F11839257> and F11839255: after.html <https://reviews.llvm.org/F11839255> exploded graphs, to see the number of nodes.

> See the output of `-analyzer-stats` option for details.

I didn't find this option in `clang --help`. I've never use it. Please, explain how to use it.

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:588
+  // AnyX2 means that two expressions marked as `Any` are met in code,
+  // and there is a special column for that, for example:
+  // if (x >= y)
----------------
xazax.hun wrote:
> I have really hard time processing how to interpret `AnyX2`.
> 
> For example in the code below:
> ```
>    if (x >= y)
>      if (x != y)
>        if (x <= y)
>          return false
> ```
> 
> ```
>    if (x >= y)
>      if (x == y)
>        if (x <= y)
>          return true
> ```
> 
> We would get different results for `<=`. So I do not really get how I should read the `AnyX2` column.
I'm sorry for the obscure explanation. I was really thinking about how to explain it more clear.
Let's consider this code:
```
if (x >= y)
  if (x != y)
    if (x <= y)
```
If 'x >= y' is //true//, then following `x <= y` is  `Any` ( can be //true// or //false// ). //See table in the summary.//
If 'x != y' is //true//, then following `x <= y` is  `Any` ( can be //true// or //false// ).
If 'x >= y && x != y' is //true//, then following `x <= y` is `False` only (it means that we previously met two Any states, what I've called as **AnyX2**).

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:622
+  auto IndexOP = IndexFromOp(OP);
+  auto LHS = SSE->getLHS();
+  auto RHS = SSE->getRHS();
----------------
xazax.hun wrote:
> Could `LHS` and `RHS` be other expressions? Does it make sense to continue executing this function if one of them is not a simple symbol?
I think it makes sense, because `(x+3*y) > z` and `z < (x+3*y)` perfectly fits in the logic.

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:629-630
+  auto &One = BV.getValue(1, T);
+  const RangeSet TrueRangeSet(F, One, One);
+  const RangeSet FalseRangeSet(F, Zero, Zero);
+  int AnyStates = 0;
----------------
xazax.hun wrote:
> ASDenysPetrov wrote:
> > Folk, is this a good idea to explicitly create bool ranges as a return value?
> > As for me, comparisons like `>`, `<`, etc. can only produce bool-based ranges, otherwise it would be weird.
> I think modeling booleans with ranges is OK. This is what the analyzer is doing at the moment. But I guess the question is about whether boolean results of comparisons is a good way to store the relationships between the symbols. I am not convinced about that see my inline comment below.
Once I met a case when logic operator `<` retruns me int-based ranges {0,0} on the //false// branch and {1, 429496725} (instead of bool-based {1,1}) on the //true// branch. It confused me, so the question arised.

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:635
+
+    // Let's find an expression e.g. (x < y).
+    OP = OpFromIndex(i);
----------------
xazax.hun wrote:
> I am not really familiar with how constraints are represented but I vaguely recall the analyzer sometimes normalizing some expressions like conversion `A == B` to `A - B == 0`. I am just wondering if this API to look this expression up is not the right abstraction as it might be better to handle such normalizations in a unified, central way.
> 
> Also, note that this method does not handle transitivity. I wonder if maintaining set of ranges is the right representation for this information at all. The ordering between the symbols will define a lattice. Representing that lattice directly instead of using ranges might be more efficient.
>conversion A == B to A - B == 0. 
All my samples `x == y` worked as `x == y`. I havn't seen such transformations.
>Representing that lattice directly instead of using ranges might be more efficient.
How do you imagine that lattice? Could you detail it?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78933/new/

https://reviews.llvm.org/D78933