[PATCH] D78933: [analyzer] RangeConstraintManager optimizations in comparison expressions

Wed May 6 05:20:32 PDT 2020

xazax.hun added inline comments.

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:588
+  // AnyX2 means that two expressions marked as `Any` are met in code,
+  // and there is a special column for that, for example:
+  // if (x >= y)
----------------
I have really hard time processing how to interpret `AnyX2`.

For example in the code below:
```
   if (x >= y)
     if (x != y)
       if (x <= y)
         return false
```

```
   if (x >= y)
     if (x == y)
       if (x <= y)
         return true
```

We would get different results for `<=`. So I do not really get how I should read the `AnyX2` column.

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:622
+  auto IndexOP = IndexFromOp(OP);
+  auto LHS = SSE->getLHS();
+  auto RHS = SSE->getRHS();
----------------
Could `LHS` and `RHS` be other expressions? Does it make sense to continue executing this function if one of them is not a simple symbol?

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:629-630
+  auto &One = BV.getValue(1, T);
+  const RangeSet TrueRangeSet(F, One, One);
+  const RangeSet FalseRangeSet(F, Zero, Zero);
+  int AnyStates = 0;
----------------
ASDenysPetrov wrote:
> Folk, is this a good idea to explicitly create bool ranges as a return value?
> As for me, comparisons like `>`, `<`, etc. can only produce bool-based ranges, otherwise it would be weird.
I think modeling booleans with ranges is OK. This is what the analyzer is doing at the moment. But I guess the question is about whether boolean results of comparisons is a good way to store the relationships between the symbols. I am not convinced about that see my inline comment below.

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:635
+
+    // Let's find an expression e.g. (x < y).
+    OP = OpFromIndex(i);
----------------
I am not really familiar with how constraints are represented but I vaguely recall the analyzer sometimes normalizing some expressions like conversion `A == B` to `A - B == 0`. I am just wondering if this API to look this expression up is not the right abstraction as it might be better to handle such normalizations in a unified, central way.

Also, note that this method does not handle transitivity. I wonder if maintaining set of ranges is the right representation for this information at all. The ordering between the symbols will define a lattice. Representing that lattice directly instead of using ranges might be more efficient.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D78933/new/

https://reviews.llvm.org/D78933