[cfe-users] Running 'scan-build' in SRD's test cases (NIST)

Tue Mar 11 07:33:15 PDT 2014

I think it is a good idea improve it by the GSoC 2014 Edward! This can
speed development and put in conform static analyzer with NIST nomenclature.

2014-03-11 8:47 GMT-03:00 Edoardo P. <ed0.88.prez at gmail.com>:

> Hi Ted,
>
> Thank you for your answer.
>
> If I got it right, the parameter variables of a certain function,
> cannot be constrained until inside the TU (I hope we can manage to
> analyze together two or more TU) there's a call to such function
> somewhere else and the arguments are constrained.
> If we get a function call and we don't have the declaration of such
> function or it's inside another TU, we can't tell anything regarding
> the parameters or the global variables (is it possible for a function
> declared in another TU access to a static variable from another TU?),
> which might change, but for the stl functions, we know how they work,
> hence we can choose between:
> 1)"inject" inside the TU their "generic" declarations (I wonder which
> functions are OS-dependent...) and make it parse,
> 2)hardcode their behavior inside clang ...
>
> Obviously, we should add something to the model.
>
> Anyways, since we got in the Juliet Test Suite argument, is it
> possible to open a new bug report with a status of the static analyzer
> with it? It would keep track and could be improved by the GSoC 2014
> student(s) who wants to work on it.
> Also, I suggest to include to the todo things for the GSoC the idea of
> conforming the new Unique Defect ID to the CWE ID. This would be
> helpful when you want to test clang analyzer with the Static Analysis
> Tool Exposition (SATE): http://samate.nist.gov/SATE.html
>
>
> 2014-03-10 2:32 GMT+01:00 Ted Kremenek <kremenek at apple.com>:
> > Hi Edoardo,
> >
> > What you are observing is something that is potentially easy to "fix",
> but
> > essentially requires additional domain-specific modeling in the static
> > analyzer.
> >
> > To understand what is going on, consider the following example:
> >
> >   void foo(int x) {
> >     1 / x;
> >   }
> >
> > In this example we have a potential division-by-zero.  If 'foo' is
> analyzed
> > in isolation, we don't know anything about the value of 'x'.  We can do
> one
> > of two things:
> >
> > (1) Warn that 'x' could potentially be zero.
> >
> > (2) Do not emit a warning, but register there was a division and that 'x'
> > cannot be zero after the division, and keep analyzing.
> >
> > In both cases the value of 'x' is unconstrained, so we can look at #1
> and #2
> > as two potential default behaviors.  In the analyzer we do #2.
>  Essentially,
> > although the value of 'x' is unconstrained, we don't warn here.  The
> > rationale is that the analyzer would emit a huge number of false
> positives
> > for integers that are possibly perfectly safe.  We take the same tactic
> for
> > pointers, who all could hypothetically could be null.
> >
> > Instead, we take the strategy in the analyzer to flag bugs like null
> > pointers and division-by-zero if we have evidence belief there may be a
> > problem.  For example, if a pointer is checked against null, we have
> > evidence to believe that the pointer could be null.  Similarly, there
> may be
> > some APIs that we know could return a null pointer, and thus we can
> possibly
> > track that information and see if a pointer coming from one of these
> sources
> > is dereferenced.
> >
> > The same tactic can be taken with integers and divide-by-zero.  Let's
> look
> > at one of your examples:
> >
> > void f_fscanf()
> > {
> >     int data;
> >     fscanf(stdin, "%d", &data);
> >     /* POTENTIAL FLAW: Possibly divide by zero */
> >     printIntLine(100 / data);
> > }
> >
> > After the call to 'fscanf' the analyzer tracks that 'data' binds to some
> > symbolic integer value.  That value is considered to be underconstrained,
> > just like in my example above.  In this example however, we could track
> some
> > additional information that it came from an unsafe API like fscanf.  This
> > information is something *additional* that we would need to model, as we
> > probably would not want to do this for any arbitrary API call.  Modeling
> > this might not be too hard.  For example, the symbolic integer value does
> > have some information associated with it to tell us where it originated;
> we
> > could possibly use this to consult an API list to determine if the
> integer
> > value should be treated as unsafe.  We'd then modify the divide-by-zero
> > logic in the analyzer to emit a warning if an under constrained integer
> came
> > from one of these sources.  We'd probably could conjure up something
> fairly
> > ad hoc at first for a few specific cases, and then generalize it.  A
> while
> > ago we experimented with generalized taint analysis in the analyzer,
> which
> > is something that would also possibly be useful here.
> >
> > Does that answer your question?  Essentially we need a little additional
> > modeling in the analyzer for these APIs.
> >
> > Ted
> >
> >
> > On Mar 9, 2014, at 9:17 AM, Edoardo P. <ed0.88.prez at gmail.com> wrote:
> >
> > Ping?
> >
> >
> > 2014-02-20 22:25 GMT+01:00 Edoardo P. <ed0.88.prez at gmail.com>:
> >>
> >> I forgot to add more info regarding the not detected int divisions:
> >>
> >> the only correctly warned parts are for very simple cases, like this:
> >>
> >> int x = 0; ... int y = 100 / x;
> >>
> >> and the modulo variant. The other ones, which are not detected, assign x
> >> with:
> >>
> >> - atoi (used when converting a string returned by socket functions or
> >> fgets)
> >> - fscanf
> >> - rand
> >>
> >> And all the variants of such functions. The examples for fscanf and rand
> >> are simple:
> >>
> >> fscanf variant:
> >>
> >> void f_fscanf()
> >> {
> >>     int data;
> >>     fscanf(stdin, "%d", &data);
> >>     /* POTENTIAL FLAW: Possibly divide by zero */
> >>     printIntLine(100 / data);
> >> }
> >>
> >> rand variant:
> >>
> >> void f_rand()
> >> {
> >>     int data;
> >>     data = rand();
> >>     /* POTENTIAL FLAW: Possibly divide by zero */
> >>     printIntLine(100 / data);
> >> }
> >>
> >> in the test suite, rand is replaced with a macro, RAND_32(), defined as:
> >>
> >> from testcasesupport/std_testcase.h:
> >> /* rand only returns 15 bits, so we xor 3 calls together to get the full
> >> result (13 bits overflow, but that is okay) */
> >> #define RAND32() ((rand()<<30) ^ (rand()<<15) ^ rand())
> >>
> >> (I'd truncate the first rand() to the first 2 bits before shifting it to
> >> 30th bit position, just to avoid overflow ...)
> >>
> >> The 'itoa' case is more complex. Imho, the analyzer, before reporting
> the
> >> warning, should check that there can be a possibility that the string
> >> argument will be converted to 0. I hope it's possible with the current
> >> analyzer architecture...
> >>
> >> Cheers,
> >> Edward-san
> >>
> >> 2014-02-20 15:26 GMT+01:00 Edoardo P. <ed0.88.prez at gmail.com>:
> >>
> >>> Hi Lucas:
> >>>
> >>>>
> >>>> I am doing today a script to run all test case in Juliet with Clang
> and
> >>>> generate a report (CSV file), when a finished this i will send you the
> >>>> results.
> >>>
> >>>
> >>> Please do.
> >>>
> >>>>
> >>>> But a run manually and clang can find only 54 weaknesses in a
> >>>> total of 1476 files (into testcases/CWE369_Divide_by_zero, including
> s01
> >>>> and s02 directories).
> >>>>
> >>>
> >>> I can confirm that. In IRC I discussed with some llvm devs about the
> need
> >>> to report the float division by zero (Standard C/C++ says it's
> undefined
> >>> behavior, but Annex F implicitly allows it, thanks to IEEE 754). I
> have an
> >>> idea: add a new static analyzer option which allows or disallows
> optional
> >>> features, including the optional Annexes. That would help a lot imho.
> >>>
> >>> Cheers,
> >>> Edward-san
> >>>
> >>>>
> >>>> On Tue, 2014-02-18 at 16:19 +0100, Edoardo P. wrote:
> >>>> > Hi, Lucas, Jordan:
> >>>> >
> >>>> >
> >>>> > About the division by zero checking, I run this:
> >>>> >
> >>>> >
> >>>> > scan-build --use-analyzer /usr/bin/clang -o buildres/ clang -c -I
> >>>> > testcasesupport -DOMITGOOD
> >>>> >
> >>>> >
> testcases/CWE369_Divide_by_Zero/s02/CWE369_Divide_by_Zero__int_zero_divide_01.c
> >>>> > -o /dev/null
> >>>> >
> >>>> >
> >>>> > and I get the warnng:
> >>>> >
> >>>> >
> >>>> >
> testcases/CWE369_Divide_by_Zero/s02/CWE369_Divide_by_Zero__int_zero_divide_01.c:30:22:
> >>>> > warning: Division by zero
> >>>> >     printIntLine(100 / data);
> >>>> >                  ~~~~^~~~~~
> >>>> >
> >>>> >
> >>>> > So, Lucas, which file was failing for you?
> >>>> >
> >>>> >
> >>>> > Regarding the experience, here it is what I gathered till now:
> >>>> >
> >>>> > I created a very huge file_list.txt, containing the source files to
> >>>> > compile (I used 'find -name *.c*' in the juliet directory), then
> >>>> > filtered away the 'main\.c', 'main_linux\.c' and 'testcasesupport'
> >>>> > files (grep -v), which have nothing to check, then I sorted the list
> >>>> > by CWE number (I had to do manual sorting because I couldn't manage
> to
> >>>> > sort, for example, CWE15_* and CWE114_* correctly).
> >>>> >
> >>>> > Since I can't check for win32-only tests (I'm using linux), I
> filtered
> >>>> > them via 'grep -v w32' and 'grep -v wchar_t' (some tests require a
> >>>> > 'fopen'-like function with wchar_t string, which seems to be
> exclusive
> >>>> > to win32).
> >>>> >
> >>>> >
> >>>> > Regarding the per-translation-unit analysis, some files are, indeed,
> >>>> > separated sources for a program, so I didn't hesitate to filter them
> >>>> > with these patterns, according to the manual: "[abcdeBG]\.c" and
> >>>> > "good1" (last was associated with a 'bad' file, which was already
> >>>> > filtered).
> >>>> >
> >>>> >
> >>>> > With this file_lists.txt, I run the static analyzer only for the
> false
> >>>> > positives, with this command:
> >>>> >
> >>>> > < file_list.txt xargs -n 1 scan-build --use-analyzer /usr/bin/clang
> >>>> > -disable-checker deadcode.DeadStores -o buildres/ clang -c -I
> >>>> > testcasesupport -DOMITBAD -o /dev/null > /dev/null 2> warns.txt
> >>>> >
> >>>> >
> >>>> > so, it checks all the files in the file list, saves the results in
> >>>> > buildres and reports the warnings in warns.txt file, ignoring the
> >>>> > DeadStores warns because they're reported a lot often.
> >>>> >
> >>>> >
> >>>> > Well, there are tons of false positives, caused by the flow variants
> >>>> > which involve global and static variables, shadow variables usage,
> >>>> > etc.
> >>>> >
> >>>> >
> >>>> > To the devs, I'd like to know which CWE are you interested, from the
> >>>> > list I attached on that email:
> >>>> >
> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-February/035113.html .
> >>>> >
> >>>> >
> >>>> > About the results, if I have more time, I'll post some of them.
> >>>> >
> >>>> >
> >>>> >
> >>>> > 2014-02-18 2:05 GMT+01:00 Lucas Kanashiro
> >>>> > <kanashiro.duarte at gmail.com>:
> >>>> >         Thanks Jordan!
> >>>> >
> >>>> >         Could you leave me updated on the matter? I am so interested
> >>>> >         in this,
> >>>> >         and if it is necessary and possible i want to help to solve
> >>>> >         the
> >>>> >         potential issue.
> >>>> >
> >>>> >         Edward, can you tell us your experience with Clang and
> Juliet
> >>>> >         Test
> >>>> >         Suite?
> >>>> >
> >>>> >
> >>>> >         On Mon, 2014-02-17 at 09:43 -0800, Jordan Rose wrote:
> >>>> >         > Hi, Lucas. The analyzer currently runs a
> >>>> >         per-translation-unit analysis, so it misses some bugs that
> >>>> >         whole-program analysis may be able to catch. I'm guessing
> >>>> >         that's the reason it's unable to catch this particular
> issue.
> >>>> >         >
> >>>> >         > In general, the analyzer is set for reasonably fast
> >>>> >         turnaround (depending on the size of the project, of
> course),
> >>>> >         so it also might not do a fully precise interprocedural
> >>>> >         analysis if the state space gets too big. I'd have to see
> the
> >>>> >         particular test case to tell what's going on here.
> >>>> >         >
> >>>> >         > I did see that Edward (CC'd) wanted to try bringing in the
> >>>> >         Juliet Test Suite for the analyzer, but neither I nor Ted
> (the
> >>>> >         lead on the analyzer) have gotten the chance to sit down and
> >>>> >         look at what this would actually entail. It's possible he's
> >>>> >         encountered similar issues, however.
> >>>> >         >
> >>>> >         > Jordan
> >>>> >         >
> >>>> >         >
> >>>> >         > On Feb 15, 2014, at 5:58 , Lucas Kanashiro
> >>>> >         <kanashiro.duarte at gmail.com> wrote:
> >>>> >         >
> >>>> >         > > I am trying to running 'scan-build' in Juliet suite
> >>>> >         testcase v1.2 (NIST
> >>>> >         > > indication) to catch some bugs of 'Division by zero'
> (CWE
> >>>> >         369) and I
> >>>> >         > > can't do it, the scan-build can't show me the existing
> >>>> >         bugs. Did someone
> >>>> >         > > try to do it yet?
> >>>> >         > >
> >>>> >         > > I have a doubt that scan-build can identify a bug of
> >>>> >         division by zero in
> >>>> >         > > this case (when parameter denominator is zero):
> >>>> >         > >
> >>>> >         > > int divide (int denominator) {
> >>>> >         > >     return 10/denominator;
> >>>> >         > > }
> >>>> >         > >
> >>>> >         > > Can someone help me? Is this a deficiency of scan-build?
> >>>> >         Can scan-build
> >>>> >         > > identify the bugs in Juliet suite?
> >>>> >         > >
> >>>> >         > > Thanks in advance!
> >>>> >         > >
> >>>> >         > > --
> >>>> >         > > Lucas Kanashiro Duarte
> >>>> >         > > Engenharia de Software - FGA/UnB
> >>>> >         > > kanashiro.duarte at gmail.com
> >>>> >         > >
> >>>> >         > > _______________________________________________
> >>>> >         > > cfe-users mailing list
> >>>> >         > > cfe-users at cs.uiuc.edu
> >>>> >         > > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-users
> >>>> >         >
> >>>> >
> >>>> >         --
> >>>> >         Lucas Kanashiro Duarte
> >>>> >         Engenharia de Software - FGA/UnB
> >>>> >         kanashiro.duarte at gmail.com
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Mathematics is the language with which God has written the universe.
> >>>> > (Galilei)
> >>>>
> >>>> --
> >>>> Lucas Kanashiro Duarte
> >>>> Engenharia de Software - FGA/UnB
> >>>> kanashiro.duarte at gmail.com
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Mathematics is the language with which God has written the universe.
> >>> (Galilei)
> >>
> >>
> >>
> >>
> >> --
> >> Mathematics is the language with which God has written the universe.
> >> (Galilei)
> >
> >
> >
> >
> > --
> > Mathematics is the language with which God has written the universe.
> > (Galilei)
> >
> >
>
>
>
> --
> Mathematics is the language with which God has written the universe.
> (Galilei)
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-users/attachments/20140311/1b1e8696/attachment.html>