[PATCH] D101526: [analyzer][StdLibraryFunctionsChecker] Add NoteTags for applied arg constraints

Wed Jul 21 10:35:17 PDT 2021

NoQ added inline comments.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp:857
+            if (BR.isInteresting(ArgSVal))
+              OS << Msg;
+          }));
----------------
Szelethus wrote:
> steakhal wrote:
> > Ah, there is a slight issue.
> > You should mark some stuff interesting here, to make this interestingness propagate back transitively.
> > 
> > Let's say `ArgSVal` is `x + y` which is considered to be out of range `[42,52]`. We should mark both `x` and `y` interesting because they themselves could have been constrained by the StdLibChecker previously. So, they must be interesting as well.
> > 
> > On the same token, IMO `PathSensitiveBugReport::markInteresting(symbol)` should be //transitive//. So that all `SymbolData` in that symbolic expression tree are considered interesting. What do you think @NoQ?
> > If we were doing this, @martong  - you could simply acquire the assumption you constructed for the given `ValueConstraint`, and mark that interesting. Then all `SymbolData`s on both sides of the logic operator would become implicitly interesting.
> >On the same token, IMO PathSensitiveBugReport::markInteresting(symbol) should be transitive. So that all SymbolData in that symbolic expression tree are considered interesting. What do you think @NoQ?
> 
> Thats how I'd expect this to work. This shouldn't be a burden on the checker developer (certainly not this kind of a checker), but rather be handled by `PathSensitiveBugReport`.
> 
> So I think this is fine as it is.
Interestingness isn't a thing on its own; its meaning is entirely tied to the nature of the specific bug report. An interesting pointer in the null dereference checker is absolutely not the same thing as an interesting mutex pointer in `PthreadLockChecker` or a (de)allocated pointer in `MallocChecker`.

I currently treat interestingness as a "GDM for visitors". It's their own way of communicating with themselves and with each other, a state they keep track of and update as they visit the bug report (initially populated during construction of the bug report). But the meaning of this state is entirely specific to the visitors. It is visitors who give interestingness a meaning (and the visitors are, naturally, also hand-picked during construction of the bug report).

So I think the right question to ask is "what do you want interestingness to mean in your checker?" and build your visitors accordingly.

Your visitors should provide enough information for the user to be able to understand the bug report. When the report says "$x + $y is in range [42, 52] and no values in that range are a valid input to that function", the user asks "why do you think $x + $y is in range [42, 52]?" and we'll have to answer that.

For example, in
```lang=c++
if (x + y >= 42 && x + y <= 52)
  foo(x + y);
```
there's no need to track ranges for $x and $y separately; it is sufficient to point the user to the constraint over $x + $y obtained from the if-statement. On the other hand, in
```lang=c++
if (x >= 44 && x <= 50)
  if (y >= -2 && y <= 2)
    foo(x + y);
```
you'll have to explain both $x and $y in order for the user to understand that $x + $y is indeed in range [42, 52].

There are also other funny edge cases depending on the nature of the arithmetic, such as
```
int z = x * y;
if (x == 0)
  return 1 / z;
```
where in order to explain the division by the zero value $x * $y it is sufficient to explain $x which makes $y redundant. And if they both are zero then we should probably flip a coin?

So I think this is a non-trivial problem. Even on the examples above (let alone other bug types!) it's easy to see that interestingness of $x and $y doesn't always follow from the interestingness of $x + $y (but sometimes it indeed does). I think the answer lies somewhere in the underneathies of the constraint solver: we have to follow its logic in order to find out how the range was inferred (which it probably should annotate with more note tags so that we didn't have to reverse-engineer it in the visitor?) (@vsavchenko you may find this discussion particularly peculiar).

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D101526/new/

https://reviews.llvm.org/D101526