[PATCH] D77792: [analyzer] Extend constraint manager to be able to compare simple SymSymExprs

Mon Sep 21 17:50:10 PDT 2020

NoQ added inline comments.

================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:103-110
+  enum class CompareResult {
+    unknown,
+    identical,
+    less,
+    less_equal,
+    greater,
+    greater_equal
----------------
baloghadamsoftware wrote:
> ASDenysPetrov wrote:
> > Can we use `Optional<BinaryOperatorKind>` instead, to reduce similar enums? Or you want to separate the meaning in a such way?
> Good question. @NoQ?
The meaning is obviously completely different even when the names actually match. `BO_LT` is not a "result", it's the operation itself. It gets even worse for other opcodes (what kind of comparison result is `BO_PtrMemI`???).

The idea to re-use the enum is noble but we definitely need to find some other enum.

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:814

+Optional<ProgramStateRef> RangeConstraintManager::tryAssumeSymSymOp(
+    ProgramStateRef State, BinaryOperator::Opcode Op, SymbolRef LHSSym,
----------------
baloghadamsoftware wrote:
> NoQ wrote:
> > I believe you don't need to return an optional here. The method simply acknowledges any assumptions it could make in the existing state and returns the updated state. Therefore, if it wasn't able to record any assumptions, it returns the existing state. Because the only reasonable behavior the caller could implement when the current implementation returns `None` is to roll back to the existing state anyway.
> Actually, `tryAssumeSymSymOp()` does not assume anything and therefore it does not return a new state. What it actually returns is a ternary logical value: the assumption is either valid, invalid or we cannot reason about it. Maybe Optional<bool> would be better here and we should also chose a less misleading name, because it does not "try to assume" anything, but tries to reason about the existing assumption.
I dislike this design because it turns the code into a spaghetti in which every facility in the constraint manager has to be aware of any other facility in the constraint manager and agree on who's responsible for what. It's too easy to miss something and missing something isn't as bad as doing double work.

Like, should different facilities in the constraint manager try to record as *much* information in the program state ("fat constraints") as possible or as *little* as possible ("thin constraints")? Eg., in this example, if we know that the range for `$x` is strictly below the range for `$y`, should we also add the opaque constraint `$x < $y` to the program state, or should we instead teach every facility to infer that `$x < $y` by looking at their ranges?

I feel we should go with fat constraints.

1. Con: They're fat. They eat more memory, cause more immutable tree rebalances, etc.
2. Con: It's easy to accidentally make two similar states look different. Eg., `{ $x: [1, 2], $y: [3, 4] }` and `{ $x : [1, 2], $y: [3, 4], $x < $y: true }` are the same state but they won't be deduplicated and if they appear on different paths at the same program point these paths won't get merged.
3. Pro: They minimize the amount of false positives by giving every facility in the constraint manager as much data as possible to conclude that the state is infeasible.
4. Pro: As i said above, they're easier to implement and to get right. In case of thin constraints, every facility has to actively let other facilites know that they don't need to handle the assumption anymore because this facility has "consumed" it. In our example it means returning an `Optional<ProgramStateRef>` which would be empty if the constraint wasn't consumed and needs to be handled by another facility (namely, `assumeSymUnsupported`). In case of fat constraints you simply add your information and you don't care if other facilities ever read it or not.

Con 1. should be dismissed as a premature optimization: we can always add some thinness to fat constraints if we have a proof that their fatness is an actual problem. Con 2. is a real issue but it's a fairly minor issue; path merges are fairly rare anyway; mergeability is a nice goal to pursue but not at the cost of having false positives. So i think pros outweight the cons here.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D77792/new/

https://reviews.llvm.org/D77792