[PATCH] D53076: [analyzer] Enhance ConditionBRVisitor to write out more information

Thu Nov 29 18:44:16 PST 2018

NoQ added a comment.

There still seem to be a couple of regressions with `Assuming...` pieces, but other than that, i believe that an awesome piece of progress has been made here. I really like how it looks now. I agree with George that dropping "Knowing" from the message looks fancier.

================
Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:86-96
+  const auto *DRE = dyn_cast_or_null<DeclRefExpr>(CondVarExpr);
+  const auto *BExpr = dyn_cast_or_null<BinaryOperator>(CondVarExpr);
+  
+  if (BExpr && !DRE)
+    DRE = dyn_cast_or_null<DeclRefExpr>(BExpr->getLHS()->IgnoreParenImpCasts());
+  
+  if (BExpr && !DRE)
----------------
The usual style in LLVM for this sort of stuff is

```
if (const auto *DRE = dyn_cast_or_null<DeclRefExpr>(CondVarExpr)) {
  // Do stuff with DRE.
}
```

`CondVarExpr` definitely cannot successfully cast to both `DeclRefExpr` and `BinaryOperator` at the same time, so this sort of structure is much easier to understand. Also checks after `&&` are definitely redundant.

================
Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:116
+
+  const ConstraintRangeTy Ranges = State->get<ConstraintRange>();
+  for (auto I = Ranges.begin(); I != Ranges.end(); ++I) {
----------------
Hmm, why are you even able to access it here? The `get<>` key should only be visible in a single translation unit in which it is defined, otherwise it's buggy because it relies on addresses of static globals to work.

Actually, could you separate the work of displaying the exact range boundary (this whole `RangeSet` business) into a separate diff? I think it's a separate change that requires separate discussion.

Also instead of accessing the constraint manager's private trait directly, you should use the `getRange()` method. It adds all the necessary implicit assumptions that aren't represented explicitly as items in the map.

================
Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:127
+
+static Optional<int> getConcreteIntegerValue(const Expr *CondVarExpr,
+                                             const ExplodedNode *N) {
----------------
Why `int`?

That's not what `getExtValue()` returns (so you can accidentally print a truncated value), and generally the size of the `int` type on the host machine shouldn't affect analysis results.

I think you should print the `APSInt` directly instead of converting it into a raw integer.

================
Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:131-133
+  // If there is no value return.
+  if (!State->getStore())
+    return {};
----------------
Even if the Store is currently null, every live variable in the program inevitably has a value, so i don't think we should bail out here.

================
Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:140-142
+  // If symbolic range of the SVal is found that means the value is unknown.
+  if (getRangeSet(CondVarExpr, N))
+    return {};
----------------
There are cornercases where the range consists of exactly one point, so the value is technically concrete but wasn't yet collapsed into a concrete value by the engine.

In any case, as far as i understand, this code is there only for optimization, and i don't think it's actually making things faster because the lookup is already expensive.

================
Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:145
+  // The declaration of the value may rely on a pointer so take its l-value.
+  const auto DeclSVal = State->getSVal(State->getLValue(CondVarVD, LCtx));
+
----------------
`auto` shouldn't be used here and in a few other places according to [[ https://llvm.org/docs/CodingStandards.html#use-auto-type-deduction-to-make-code-more-readable | the coding guidelines ]].

Essentially, it should only be used for cast results, where the type is already written out.

================
Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:147
+
+  // The value may inline.
+  const auto InlineSVal = State->getSVal(CondVarExpr, LCtx);
----------------
What do you mean by "inline"? There are no function calls going on here.

================
Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:1885-1886
   ProgramPoint progPoint = N->getLocation();
-  ProgramStateRef CurrentState = N->getState();
-  ProgramStateRef PrevState = N->getFirstPred()->getState();
-
-  // Compare the GDMs of the state, because that is where constraints
-  // are managed.  Note that ensure that we only look at nodes that
-  // were generated by the analyzer engine proper, not checkers.
-  if (CurrentState->getGDM().getRoot() ==
-      PrevState->getGDM().getRoot())
+  const ExplodedNode *PredNode = N->getFirstPred();
+  const ExplodedNode *PredPredNode = PredNode->getFirstPred();
+
----------------
I'm worried that checker transitions can screw up the order of nodes here.

Every time a checker does `addTransition()` or `generateNonFatalErrorNode()`, it injects an intermediate node, and there can be indefinitely many of those between the nodes you're //actually// looking for.

You should probably rely on program points instead, i.e. "this is the node in which the terminator condition was evaluated", "this is the node in which we jumped from one CFG block to another".

================
Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:2244
+    if (CurrentRanges)
+      Out << CurrentRanges->begin()->To();
+    else
----------------
Did you mean `->end()->To()`? We need a test in which the `RangeSet` contains more than one range.

================
Comment at: test/Analysis/diagnostics/macros.cpp:33
 void testDoubleMacro(double d) {
-  if (d == DBL_MAX) { // expected-note {{Taking true branch}}
+  if (d == DBL_MAX) { // expected-note {{Assuming 'd' is equal to DBL_MAX}}
+                      // expected-note at -1 {{Taking true branch}}
----------------
Charusso wrote:
> NoQ wrote:
> > This one's good. Static Analyzer doesn't understand floats, so this branch is indeed non-trivial. There should indeed be an assuming... piece here.
> A little bit misunderstandable, because we do not know anything about doubles. May the generic message fit better?
Yup, the generic message should be perfectly fine, but i also don't see any problems with the non-generic message, as it is "technically the truth".

================
Comment at: test/Analysis/diagnostics/undef-value-param.m:56
     SCDynamicStoreRef ref = anotherCreateRef(&err, x);
-    if (err) { 
-               //expected-note at -1{{Assuming 'err' is not equal to 0}}
-               //expected-note at -2{{Taking true branch}}
-        CFRelease(ref);
-        ref = 0; // expected-note{{nil object reference stored to 'ref'}}
+    if (err) { //expected-note{{Taking true branch}}
+      CFRelease(ref);
----------------
Charusso wrote:
> I am not sure why but the range here [1, (2^64)-1]. There is would not be any number like 0?
That's suspicious, could you have a look at when exactly does this range appear? There should be an `Assuming...` piece here. I suspect that you might be looking at the state after the split, which already has the range, and not looking at the state before the split, which doesn't have the range.

================
Comment at: test/Analysis/new-ctor-malloc.cpp:11
   void *x = malloc(size); // expected-note {{Memory is allocated}}
-  if (!x) // expected-note    {{Assuming 'x' is non-null}}
-          // expected-note at -1 {{Taking false branch}}
+  if (!x) // expected-note {{Taking false branch}}
     return nullptr;
----------------
Charusso wrote:
> The range information is [1, (2^64)-1] but as I saw `malloc` could return null. Who is lying here?
Yes, `malloc` is nullable. There should be an `Assuming...` piece here. It's probably the same problem that we had with `anotherCreateRef()`.

================
Comment at: test/Analysis/uninit-vals.m:324
   testObj->origin = makeIntPoint2D(1, 2);
-  if (testObj->size > 0) { ; } // expected-note{{Taking false branch}}
+  if (testObj->size > 0) { ; } // expected-note{{Assuming 'testObj->size' is <= 0}}
                                // expected-note at -1{{Taking false branch}}
----------------
Charusso wrote:
> george.karpenkov wrote:
> > That does not seem right: from `calloc` the analyzer should know that the `testObj->size` is actually zero.
> That `Assuming...` piece is rely on the change between the last two diffs: I just print out more `VarDecls` in `patternMatch()`. I thought everything works fine, so I will check that problem if @NoQ will not be faster as he already found some problematic code piece in that test file.
`testObj->size` is an `UnknownVal` at this point (see the parallel comment on the older revision's thread). So there should be an `Assuming...` piece here. Ultimately there shouldn't, but throwing an `Assuming...` piece is the correct behavior for the sub-system that decides whether an `Assuming...` piece should be emitted.

================
Comment at: test/Analysis/uninit-vals.m:167
   testObj->origin = makePoint(0.0, 0.0);
-  if (testObj->size > 0) { ; } // expected-note{{Taking false branch}}
+  if (testObj->size > 0) { ; } // expected-note{{Assuming the condition is false}}
+                               // expected-note at -1{{Taking false branch}}
----------------
Charusso wrote:
> NoQ wrote:
> > These are pretty weird. As far as i understand the test, these should be there. But i'm suddenly unable to debug why were they not shown before, because there's either something wrong with exploded graph dumps or with the exploded graph itself; it appears to be missing an edge right after `size > 0` is assumed. I'll look more into those.
> We gather information with `clang-analyzer-eval` functions so the conditions would be `Knowing...` pieces but the `BugReporter.cpp` does not know about these extra assumptations. Where to teach to do not `Assuming...`, but `Knowing...` these?
Aha, i see.

`testObj->size` evaluates to pure `UnknownVal` because region store cannot deal with two overlapping lazy bindings (which is the whole point of the test).

Therefore no new constraints are added. So we cannot figure out what happens by looking at the constraints.

So it's roughly the same thing that happened in the example with floats.

We should just outright always add an "Assuming..." piece when the condition evaluates to Unknown. The new behavior is correct here.

================
Comment at: test/Analysis/virtualcall.cpp:172
 #endif
       X x(i - 1);
 #if !PUREONLY
----------------
Charusso wrote:
> Because this `X x(i - 1);` we assume these conditions? This one is tricky to be `Knowing...`.
These are the notes for `X x1(1);` in `main`. In this case `i` is a concrete value: exactly 1 on the first iteration and exactly 0 on the second iteration.

The new behavior is incorrect; there should not be an `Assuming...` piece here.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D53076/new/

https://reviews.llvm.org/D53076