[PATCH] D13126: New static analyzer checker for loss of sign/precision
Artem Dergachev via cfe-commits
cfe-commits at lists.llvm.org
Wed Mar 23 04:29:58 PDT 2016
NoQ added inline comments.
================
Comment at: lib/StaticAnalyzer/Checkers/ConversionChecker.cpp:84
@@ +83,3 @@
+// Can E value be greater or equal than Val?
+static bool canBeGreaterEqual(CheckerContext &C, const Expr *E,
+ unsigned long long Val) {
----------------
danielmarjamaki wrote:
> danielmarjamaki wrote:
> > zaks.anna wrote:
> > > danielmarjamaki wrote:
> > > > zaks.anna wrote:
> > > > > This function returns true if the value "is" greater or equal, not "can be" greater or equal. The latter would be "return StGE".
> > > > >
> > > > > Also, it's slightly better to return the StGE state and use it to report the bug. This way, our assumption is explicitly recorded in the error state.
> > > > NoQ made the same comment. I disagree.
> > > >
> > > > int A = 0;
> > > > if (X) {
> > > > A = 1000;
> > > > }
> > > > U8 = A; // <- Imho; A _can_ be 1000
> > > >
> > > > Imho it's better to say that A _can_ be 1000 unless A is 1000 for all possible execution paths through the code.
> > > >
> > > > Do you still think "is" is better than "can be"?
> > > The Clang Static Analyzer performs path sensitive analysis of the program. (It does not merge the paths at the "U8 = A" statement!!!) You will only be changing the state along a single execution path of this program. Along that path, A will always be 1000.
> > >
> > > When analyzing your example, the analyzer is going to separately analyze 2 paths:
> > > 1st path: A=0; X != 0; A =1000; U8 = A; // Here U8 is definitely 1000.
> > > 2d path: A=0; X == 0; U8 = A; // Here U8 is definitely 0.
> > >
> > > This video contains an intuitive explanation of symbolic execution technique we use: http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4
> > I understand that and I still think that value of A "can be" 1000. Yes in that path the value "is" 1000.
> >
> > But as far as I see, you and others disagree with me. And therefore I will change to "is".
> For your information in Cppcheck I say that a value is "possible" if some path(s) generates that value. And "always" when all paths generate that value.
>
> Code example:
>
> int f(int x) {
> int a = 1000;
> int b = 0;
> if (x == 500)
> a = 3;
> return a + b - x;
> }
>
> Debug output (cppcheck --debug-normal file.c):
>
> ##Value flow
> Line 3
> 1000 always 1000
> Line 4
> 0 always 0
> Line 5
> x possible 500
> == possible 1
> 500 always 500
> Line 6
> 3 always 3
> Line 7
> a possible {1000,3}
> + possible {1000,3}
> b always 0
> x possible 500
>
> For me personally it is confusing to say that A "is" 1000. That is different to how I normally think of it in Cppcheck.
Consider the following examples:
```
// Example 1.
int a = rand();
use(a);
```
```
// Example 2.
int b;
scanf("%d", &b);
use(b);
```
Here `a` and `b` "can be" 1000 when passed into `use()`, however your function would return `false`, because there is no particular execution path found on which they are "certainly" 1000.
On the other hand, in the following example:
```
// Example 3.
int a = rand();
if (a == 1000) {
}
use(a);
```
at the `use()` of `a` your function would return `false` on the path that passes through one branch and `true` on another path, however the `false` return value would not indicate that `a` "cannot be" 1000 on this line; it simply indicates that `a` is not certainly 1000 on one (but not all) of the paths.
These examples show that the function returns "false" much more often than its name suggests, hence we propose a different terminology. The good name for the function would be "the value 'certainly is' [greater or equal] on this particular path, though probably not on every path".
Additionally, example 2 is of particular interest: it gives an example of a "tainted" symbol which is explicitly known to take all values that fit its integral type, depending on user (cf. "attacker") input. You might (some day, no rush, i guess) like to extend your checker to consider tainted values as truly "can-be-anything" and throw warnings even without finding a particular path. For such cases, your function would actually be something like "there exists a path on which it 'certainly can be', though maybe on other paths it's not as certain". There's already a basic support for such taint analysis in the analyzer.
In fact, example 1 is also a curious discussion point - even when there's no attacker to substitute your random number generator, symbols produced by `rand()` and such are also explicitly known to take all values between `0` and some kind of `RAND_MAX`, which is a sort of information that may be useful to the analyzer and can be described as a weaker kind of taint without much security implications, but still not something that can be represented as presence or lack of range constraints.
http://reviews.llvm.org/D13126
More information about the cfe-commits
mailing list