[PATCH] D13126: New static analyzer checker for loss of sign/precision

Artem Dergachev via cfe-commits cfe-commits at lists.llvm.org
Wed Mar 23 04:29:58 PDT 2016


NoQ added inline comments.

================
Comment at: lib/StaticAnalyzer/Checkers/ConversionChecker.cpp:84
@@ +83,3 @@
+// Can E value be greater or equal than Val?
+static bool canBeGreaterEqual(CheckerContext &C, const Expr *E,
+                              unsigned long long Val) {
----------------
danielmarjamaki wrote:
> danielmarjamaki wrote:
> > zaks.anna wrote:
> > > danielmarjamaki wrote:
> > > > zaks.anna wrote:
> > > > > This function returns true if the value "is" greater or equal, not "can be" greater or equal. The latter would be "return StGE".
> > > > > 
> > > > > Also, it's slightly better to return the StGE state and use it to report the bug. This way, our assumption is explicitly recorded in the error state.
> > > > NoQ made the same comment. I disagree.
> > > > 
> > > >     int A = 0;
> > > >     if (X) {
> > > >          A = 1000;
> > > >     }
> > > >     U8 = A;  // <- Imho; A _can_ be 1000
> > > > 
> > > > Imho it's better to say that A _can_ be 1000 unless A is 1000 for all possible execution paths through the code.
> > > > 
> > > > Do you still think "is" is better than "can be"?
> > > The Clang Static Analyzer performs path sensitive analysis of the program. (It does not merge the paths at the "U8 = A" statement!!!) You will only be changing the state along a single execution path of this program. Along that path, A will always be 1000. 
> > > 
> > > When analyzing your example, the analyzer is going to separately analyze 2 paths:
> > > 1st path: A=0; X != 0; A =1000; U8 = A; // Here U8 is definitely 1000.
> > > 2d  path: A=0; X == 0; U8 = A;               // Here U8 is definitely 0.
> > > 
> > > This video contains an intuitive explanation of symbolic execution technique we use: http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4
> > I understand that and I still think that value of A "can be" 1000. Yes in that path the value "is" 1000.
> > 
> > But as far as I see, you and others disagree with me. And therefore I will change to "is".
> For your information in Cppcheck I say that a value is "possible" if some path(s) generates that value. And "always" when all paths generate that value.
> 
> Code example:
> 
>     int f(int x) {
>       int a = 1000;
>       int b = 0;
>       if (x == 500)
>         a = 3;
>       return a + b - x;
>     }
> 
> Debug output (cppcheck --debug-normal file.c):
> 
>     ##Value flow
>     Line 3
>       1000 always 1000
>     Line 4
>       0 always 0
>     Line 5
>       x possible 500
>       == possible 1
>       500 always 500
>     Line 6
>       3 always 3
>     Line 7
>       a possible {1000,3}
>       + possible {1000,3}
>       b always 0
>       x possible 500
> 
> For me personally it is confusing to say that A "is" 1000. That is different to how I normally think of it in Cppcheck.
Consider the following examples:
```
// Example 1.
int a = rand();
use(a);
```
```
// Example 2.
int b;
scanf("%d", &b);
use(b);
```
Here `a` and `b` "can be" 1000 when passed into `use()`, however your function would return `false`, because there is no particular execution path found on which they are "certainly" 1000.

On the other hand, in the following example:
```
// Example 3.
int a = rand();
if (a == 1000) {
}
use(a);
```
at the `use()` of `a` your function would return `false` on the path that passes through one branch and `true` on another path, however the `false` return value would not indicate that `a` "cannot be" 1000 on this line; it simply indicates that `a` is not certainly 1000 on one (but not all) of the paths. 

These examples show that the function returns "false" much more often than its name suggests, hence we propose a different terminology. The good name for the function would be "the value 'certainly is' [greater or equal] on this particular path, though probably not on every path".

Additionally, example 2 is of particular interest: it gives an example of a "tainted" symbol which is explicitly known to take all values that fit its integral type, depending on user (cf. "attacker") input. You might (some day, no rush, i guess) like to extend your checker to consider tainted values as truly "can-be-anything" and throw warnings even without finding a particular path. For such cases, your function would actually be something like "there exists a path on which it 'certainly can be', though maybe on other paths it's not as certain". There's already a basic support for such taint analysis in the analyzer.

In fact, example 1 is also a curious discussion point - even when there's no attacker to substitute your random number generator, symbols produced by `rand()` and such are also explicitly known to take all values between `0` and some kind of `RAND_MAX`, which is a sort of information that may be useful to the analyzer and can be described as a weaker kind of taint without much security implications, but still not something that can be represented as presence or lack of range constraints.


http://reviews.llvm.org/D13126





More information about the cfe-commits mailing list