[cfe-users] Running 'scan-build' in SRD's test cases (NIST)

Sun Mar 9 18:32:45 PDT 2014

Hi Edoardo,

What you are observing is something that is potentially easy to “fix”, but essentially requires additional domain-specific modeling in the static analyzer.

To understand what is going on, consider the following example:

  void foo(int x) {
    1 / x;
  }

In this example we have a potential division-by-zero.  If ‘foo’ is analyzed in isolation, we don’t know anything about the value of ‘x’.  We can do one of two things:

(1) Warn that ‘x’ could potentially be zero.

(2) Do not emit a warning, but register there was a division and that ‘x’ cannot be zero after the division, and keep analyzing.

In both cases the value of ‘x’ is unconstrained, so we can look at #1 and #2 as two potential default behaviors.  In the analyzer we do #2.  Essentially, although the value of ‘x’ is unconstrained, we don’t warn here.  The rationale is that the analyzer would emit a huge number of false positives for integers that are possibly perfectly safe.  We take the same tactic for pointers, who all could hypothetically could be null.

Instead, we take the strategy in the analyzer to flag bugs like null pointers and division-by-zero if we have evidence belief there may be a problem.  For example, if a pointer is checked against null, we have evidence to believe that the pointer could be null.  Similarly, there may be some APIs that we know could return a null pointer, and thus we can possibly track that information and see if a pointer coming from one of these sources is dereferenced.

The same tactic can be taken with integers and divide-by-zero.  Let’s look at one of your examples:

void f_fscanf()
{
    int data;
    fscanf(stdin, "%d", &data); 
    /* POTENTIAL FLAW: Possibly divide by zero */ 
    printIntLine(100 / data);
}

After the call to ‘fscanf’ the analyzer tracks that ‘data’ binds to some symbolic integer value.  That value is considered to be underconstrained, just like in my example above.  In this example however, we could track some additional information that it came from an unsafe API like fscanf.  This information is something *additional* that we would need to model, as we probably would not want to do this for any arbitrary API call.  Modeling this might not be too hard.  For example, the symbolic integer value does have some information associated with it to tell us where it originated; we could possibly use this to consult an API list to determine if the integer value should be treated as unsafe.  We’d then modify the divide-by-zero logic in the analyzer to emit a warning if an under constrained integer came from one of these sources.  We’d probably could conjure up something fairly ad hoc at first for a few specific cases, and then generalize it.  A while ago we experimented with generalized taint analysis in the analyzer, which is something that would also possibly be useful here.

Does that answer your question?  Essentially we need a little additional modeling in the analyzer for these APIs.

Ted

On Mar 9, 2014, at 9:17 AM, Edoardo P. <ed0.88.prez at gmail.com> wrote:

> Ping?
> 
> 
> 2014-02-20 22:25 GMT+01:00 Edoardo P. <ed0.88.prez at gmail.com>:
> I forgot to add more info regarding the not detected int divisions:
> 
> the only correctly warned parts are for very simple cases, like this:
> 
> int x = 0; ... int y = 100 / x;
> 
> and the modulo variant. The other ones, which are not detected, assign x with:
> 
> - atoi (used when converting a string returned by socket functions or fgets)
> - fscanf
> - rand
> 
> And all the variants of such functions. The examples for fscanf and rand are simple:
> 
> fscanf variant:
> 
> void f_fscanf()
> {
>     int data;
>     fscanf(stdin, "%d", &data); 
>     /* POTENTIAL FLAW: Possibly divide by zero */ 
>     printIntLine(100 / data);
> }
> 
> rand variant:
> 
> void f_rand()
> {
>     int data; 
>     data = rand(); 
>     /* POTENTIAL FLAW: Possibly divide by zero */ 
>     printIntLine(100 / data);
> }
> 
> in the test suite, rand is replaced with a macro, RAND_32(), defined as:
> 
> from testcasesupport/std_testcase.h:
> /* rand only returns 15 bits, so we xor 3 calls together to get the full result (13 bits overflow, but that is okay) */
> #define RAND32() ((rand()<<30) ^ (rand()<<15) ^ rand())
> 
> (I'd truncate the first rand() to the first 2 bits before shifting it to 30th bit position, just to avoid overflow ...)
> 
> The 'itoa' case is more complex. Imho, the analyzer, before reporting the warning, should check that there can be a possibility that the string argument will be converted to 0. I hope it's possible with the current analyzer architecture...
> 
> Cheers,
> Edward-san
> 
> 2014-02-20 15:26 GMT+01:00 Edoardo P. <ed0.88.prez at gmail.com>:
> 
> Hi Lucas:
>  
> I am doing today a script to run all test case in Juliet with Clang and
> generate a report (CSV file), when a finished this i will send you the
> results.
> 
> Please do.
>  
> But a run manually and clang can find only 54 weaknesses in a
> total of 1476 files (into testcases/CWE369_Divide_by_zero, including s01
> and s02 directories).
> 
> 
> I can confirm that. In IRC I discussed with some llvm devs about the need to report the float division by zero (Standard C/C++ says it's undefined behavior, but Annex F implicitly allows it, thanks to IEEE 754). I have an idea: add a new static analyzer option which allows or disallows optional features, including the optional Annexes. That would help a lot imho.
> 
> Cheers,
> Edward-san
>  
> On Tue, 2014-02-18 at 16:19 +0100, Edoardo P. wrote:
> > Hi, Lucas, Jordan:
> >
> >
> > About the division by zero checking, I run this:
> >
> >
> > scan-build --use-analyzer /usr/bin/clang -o buildres/ clang -c -I
> > testcasesupport -DOMITGOOD
> > testcases/CWE369_Divide_by_Zero/s02/CWE369_Divide_by_Zero__int_zero_divide_01.c -o /dev/null
> >
> >
> > and I get the warnng:
> >
> > testcases/CWE369_Divide_by_Zero/s02/CWE369_Divide_by_Zero__int_zero_divide_01.c:30:22: warning: Division by zero
> >     printIntLine(100 / data);
> >                  ~~~~^~~~~~
> >
> >
> > So, Lucas, which file was failing for you?
> >
> >
> > Regarding the experience, here it is what I gathered till now:
> >
> > I created a very huge file_list.txt, containing the source files to
> > compile (I used 'find -name *.c*' in the juliet directory), then
> > filtered away the 'main\.c', 'main_linux\.c' and 'testcasesupport'
> > files (grep -v), which have nothing to check, then I sorted the list
> > by CWE number (I had to do manual sorting because I couldn't manage to
> > sort, for example, CWE15_* and CWE114_* correctly).
> >
> > Since I can't check for win32-only tests (I'm using linux), I filtered
> > them via 'grep -v w32' and 'grep -v wchar_t' (some tests require a
> > 'fopen'-like function with wchar_t string, which seems to be exclusive
> > to win32).
> >
> >
> > Regarding the per-translation-unit analysis, some files are, indeed,
> > separated sources for a program, so I didn't hesitate to filter them
> > with these patterns, according to the manual: "[abcdeBG]\.c" and
> > "good1" (last was associated with a 'bad' file, which was already
> > filtered).
> >
> >
> > With this file_lists.txt, I run the static analyzer only for the false
> > positives, with this command:
> >
> > < file_list.txt xargs -n 1 scan-build --use-analyzer /usr/bin/clang
> > -disable-checker deadcode.DeadStores -o buildres/ clang -c -I
> > testcasesupport -DOMITBAD -o /dev/null > /dev/null 2> warns.txt
> >
> >
> > so, it checks all the files in the file list, saves the results in
> > buildres and reports the warnings in warns.txt file, ignoring the
> > DeadStores warns because they're reported a lot often.
> >
> >
> > Well, there are tons of false positives, caused by the flow variants
> > which involve global and static variables, shadow variables usage,
> > etc.
> >
> >
> > To the devs, I'd like to know which CWE are you interested, from the
> > list I attached on that email:
> > http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-February/035113.html .
> >
> >
> > About the results, if I have more time, I'll post some of them.
> >
> >
> >
> > 2014-02-18 2:05 GMT+01:00 Lucas Kanashiro
> > <kanashiro.duarte at gmail.com>:
> >         Thanks Jordan!
> >
> >         Could you leave me updated on the matter? I am so interested
> >         in this,
> >         and if it is necessary and possible i want to help to solve
> >         the
> >         potential issue.
> >
> >         Edward, can you tell us your experience with Clang and Juliet
> >         Test
> >         Suite?
> >
> >
> >         On Mon, 2014-02-17 at 09:43 -0800, Jordan Rose wrote:
> >         > Hi, Lucas. The analyzer currently runs a
> >         per-translation-unit analysis, so it misses some bugs that
> >         whole-program analysis may be able to catch. I'm guessing
> >         that's the reason it's unable to catch this particular issue.
> >         >
> >         > In general, the analyzer is set for reasonably fast
> >         turnaround (depending on the size of the project, of course),
> >         so it also might not do a fully precise interprocedural
> >         analysis if the state space gets too big. I'd have to see the
> >         particular test case to tell what's going on here.
> >         >
> >         > I did see that Edward (CC'd) wanted to try bringing in the
> >         Juliet Test Suite for the analyzer, but neither I nor Ted (the
> >         lead on the analyzer) have gotten the chance to sit down and
> >         look at what this would actually entail. It's possible he's
> >         encountered similar issues, however.
> >         >
> >         > Jordan
> >         >
> >         >
> >         > On Feb 15, 2014, at 5:58 , Lucas Kanashiro
> >         <kanashiro.duarte at gmail.com> wrote:
> >         >
> >         > > I am trying to running 'scan-build' in Juliet suite
> >         testcase v1.2 (NIST
> >         > > indication) to catch some bugs of 'Division by zero' (CWE
> >         369) and I
> >         > > can't do it, the scan-build can't show me the existing
> >         bugs. Did someone
> >         > > try to do it yet?
> >         > >
> >         > > I have a doubt that scan-build can identify a bug of
> >         division by zero in
> >         > > this case (when parameter denominator is zero):
> >         > >
> >         > > int divide (int denominator) {
> >         > >     return 10/denominator;
> >         > > }
> >         > >
> >         > > Can someone help me? Is this a deficiency of scan-build?
> >         Can scan-build
> >         > > identify the bugs in Juliet suite?
> >         > >
> >         > > Thanks in advance!
> >         > >
> >         > > --
> >         > > Lucas Kanashiro Duarte
> >         > > Engenharia de Software - FGA/UnB
> >         > > kanashiro.duarte at gmail.com
> >         > >
> >         > > _______________________________________________
> >         > > cfe-users mailing list
> >         > > cfe-users at cs.uiuc.edu
> >         > > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-users
> >         >
> >
> >         --
> >         Lucas Kanashiro Duarte
> >         Engenharia de Software - FGA/UnB
> >         kanashiro.duarte at gmail.com
> >
> >
> >
> >
> >
> > --
> > Mathematics is the language with which God has written the universe.
> > (Galilei)
> 
> --
> Lucas Kanashiro Duarte
> Engenharia de Software - FGA/UnB
> kanashiro.duarte at gmail.com
> 
> 
> 
> 
> -- 
> Mathematics is the language with which God has written the universe. (Galilei)
> 
> 
> 
> -- 
> Mathematics is the language with which God has written the universe. (Galilei)
> 
> 
> 
> -- 
> Mathematics is the language with which God has written the universe. (Galilei)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-users/attachments/20140309/69ed728a/attachment.html>