[cfe-users] Running 'scan-build' in SRD's test cases (NIST)

Thu Mar 13 22:46:45 PDT 2014

A think a more generic way to look at it is that with the absence of implementation knowledge of a called function, the analyzer comes up with an approximation of the semantics of the call.  Values whose addresses are passed to a function are assumed to be “invalidated” and replaced with a possibly new symbolic value.

Interprocedural and global analysis provide more automatic modeling of information for the analyzer to exploit when understanding the effects of a call, but it isn’t a complete panacea.  In the case of fundamental APIs like fscanf (whose implementation is not likely to be available, even if we had whole program analysis), we just need some baked in knowledge that the analyzer can exploit for deeper checking.

On Mar 11, 2014, at 4:47 AM, Edoardo P. <ed0.88.prez at gmail.com> wrote:

> Hi Ted,
> 
> Thank you for your answer.
> 
> If I got it right, the parameter variables of a certain function,
> cannot be constrained until inside the TU (I hope we can manage to
> analyze together two or more TU) there's a call to such function
> somewhere else and the arguments are constrained.
> If we get a function call and we don't have the declaration of such
> function or it's inside another TU, we can't tell anything regarding
> the parameters or the global variables (is it possible for a function
> declared in another TU access to a static variable from another TU?),
> which might change, but for the stl functions, we know how they work,
> hence we can choose between:
> 1)"inject" inside the TU their "generic" declarations (I wonder which
> functions are OS-dependent...) and make it parse,
> 2)hardcode their behavior inside clang ...
> 
> Obviously, we should add something to the model.
> 
> Anyways, since we got in the Juliet Test Suite argument, is it
> possible to open a new bug report with a status of the static analyzer
> with it? It would keep track and could be improved by the GSoC 2014
> student(s) who wants to work on it.
> Also, I suggest to include to the todo things for the GSoC the idea of
> conforming the new Unique Defect ID to the CWE ID. This would be
> helpful when you want to test clang analyzer with the Static Analysis
> Tool Exposition (SATE): http://samate.nist.gov/SATE.html
> 
> 
> 2014-03-10 2:32 GMT+01:00 Ted Kremenek <kremenek at apple.com>:
>> Hi Edoardo,
>> 
>> What you are observing is something that is potentially easy to "fix", but
>> essentially requires additional domain-specific modeling in the static
>> analyzer.
>> 
>> To understand what is going on, consider the following example:
>> 
>>  void foo(int x) {
>>    1 / x;
>>  }
>> 
>> In this example we have a potential division-by-zero.  If 'foo' is analyzed
>> in isolation, we don't know anything about the value of 'x'.  We can do one
>> of two things:
>> 
>> (1) Warn that 'x' could potentially be zero.
>> 
>> (2) Do not emit a warning, but register there was a division and that 'x'
>> cannot be zero after the division, and keep analyzing.
>> 
>> In both cases the value of 'x' is unconstrained, so we can look at #1 and #2
>> as two potential default behaviors.  In the analyzer we do #2.  Essentially,
>> although the value of 'x' is unconstrained, we don't warn here.  The
>> rationale is that the analyzer would emit a huge number of false positives
>> for integers that are possibly perfectly safe.  We take the same tactic for
>> pointers, who all could hypothetically could be null.
>> 
>> Instead, we take the strategy in the analyzer to flag bugs like null
>> pointers and division-by-zero if we have evidence belief there may be a
>> problem.  For example, if a pointer is checked against null, we have
>> evidence to believe that the pointer could be null.  Similarly, there may be
>> some APIs that we know could return a null pointer, and thus we can possibly
>> track that information and see if a pointer coming from one of these sources
>> is dereferenced.
>> 
>> The same tactic can be taken with integers and divide-by-zero.  Let's look
>> at one of your examples:
>> 
>> void f_fscanf()
>> {
>>    int data;
>>    fscanf(stdin, "%d", &data);
>>    /* POTENTIAL FLAW: Possibly divide by zero */
>>    printIntLine(100 / data);
>> }
>> 
>> After the call to 'fscanf' the analyzer tracks that 'data' binds to some
>> symbolic integer value.  That value is considered to be underconstrained,
>> just like in my example above.  In this example however, we could track some
>> additional information that it came from an unsafe API like fscanf.  This
>> information is something *additional* that we would need to model, as we
>> probably would not want to do this for any arbitrary API call.  Modeling
>> this might not be too hard.  For example, the symbolic integer value does
>> have some information associated with it to tell us where it originated; we
>> could possibly use this to consult an API list to determine if the integer
>> value should be treated as unsafe.  We'd then modify the divide-by-zero
>> logic in the analyzer to emit a warning if an under constrained integer came
>> from one of these sources.  We'd probably could conjure up something fairly
>> ad hoc at first for a few specific cases, and then generalize it.  A while
>> ago we experimented with generalized taint analysis in the analyzer, which
>> is something that would also possibly be useful here.
>> 
>> Does that answer your question?  Essentially we need a little additional
>> modeling in the analyzer for these APIs.
>> 
>> Ted
>> 
>> 
>> On Mar 9, 2014, at 9:17 AM, Edoardo P. <ed0.88.prez at gmail.com> wrote:
>> 
>> Ping?
>> 
>> 
>> 2014-02-20 22:25 GMT+01:00 Edoardo P. <ed0.88.prez at gmail.com>:
>>> 
>>> I forgot to add more info regarding the not detected int divisions:
>>> 
>>> the only correctly warned parts are for very simple cases, like this:
>>> 
>>> int x = 0; ... int y = 100 / x;
>>> 
>>> and the modulo variant. The other ones, which are not detected, assign x
>>> with:
>>> 
>>> - atoi (used when converting a string returned by socket functions or
>>> fgets)
>>> - fscanf
>>> - rand
>>> 
>>> And all the variants of such functions. The examples for fscanf and rand
>>> are simple:
>>> 
>>> fscanf variant:
>>> 
>>> void f_fscanf()
>>> {
>>>    int data;
>>>    fscanf(stdin, "%d", &data);
>>>    /* POTENTIAL FLAW: Possibly divide by zero */
>>>    printIntLine(100 / data);
>>> }
>>> 
>>> rand variant:
>>> 
>>> void f_rand()
>>> {
>>>    int data;
>>>    data = rand();
>>>    /* POTENTIAL FLAW: Possibly divide by zero */
>>>    printIntLine(100 / data);
>>> }
>>> 
>>> in the test suite, rand is replaced with a macro, RAND_32(), defined as:
>>> 
>>> from testcasesupport/std_testcase.h:
>>> /* rand only returns 15 bits, so we xor 3 calls together to get the full
>>> result (13 bits overflow, but that is okay) */
>>> #define RAND32() ((rand()<<30) ^ (rand()<<15) ^ rand())
>>> 
>>> (I'd truncate the first rand() to the first 2 bits before shifting it to
>>> 30th bit position, just to avoid overflow ...)
>>> 
>>> The 'itoa' case is more complex. Imho, the analyzer, before reporting the
>>> warning, should check that there can be a possibility that the string
>>> argument will be converted to 0. I hope it's possible with the current
>>> analyzer architecture...
>>> 
>>> Cheers,
>>> Edward-san
>>> 
>>> 2014-02-20 15:26 GMT+01:00 Edoardo P. <ed0.88.prez at gmail.com>:
>>> 
>>>> Hi Lucas:
>>>> 
>>>>> 
>>>>> I am doing today a script to run all test case in Juliet with Clang and
>>>>> generate a report (CSV file), when a finished this i will send you the
>>>>> results.
>>>> 
>>>> 
>>>> Please do.
>>>> 
>>>>> 
>>>>> But a run manually and clang can find only 54 weaknesses in a
>>>>> total of 1476 files (into testcases/CWE369_Divide_by_zero, including s01
>>>>> and s02 directories).
>>>>> 
>>>> 
>>>> I can confirm that. In IRC I discussed with some llvm devs about the need
>>>> to report the float division by zero (Standard C/C++ says it's undefined
>>>> behavior, but Annex F implicitly allows it, thanks to IEEE 754). I have an
>>>> idea: add a new static analyzer option which allows or disallows optional
>>>> features, including the optional Annexes. That would help a lot imho.
>>>> 
>>>> Cheers,
>>>> Edward-san
>>>> 
>>>>> 
>>>>> On Tue, 2014-02-18 at 16:19 +0100, Edoardo P. wrote:
>>>>>> Hi, Lucas, Jordan:
>>>>>> 
>>>>>> 
>>>>>> About the division by zero checking, I run this:
>>>>>> 
>>>>>> 
>>>>>> scan-build --use-analyzer /usr/bin/clang -o buildres/ clang -c -I
>>>>>> testcasesupport -DOMITGOOD
>>>>>> 
>>>>>> testcases/CWE369_Divide_by_Zero/s02/CWE369_Divide_by_Zero__int_zero_divide_01.c
>>>>>> -o /dev/null
>>>>>> 
>>>>>> 
>>>>>> and I get the warnng:
>>>>>> 
>>>>>> 
>>>>>> testcases/CWE369_Divide_by_Zero/s02/CWE369_Divide_by_Zero__int_zero_divide_01.c:30:22:
>>>>>> warning: Division by zero
>>>>>>    printIntLine(100 / data);
>>>>>>                 ~~~~^~~~~~
>>>>>> 
>>>>>> 
>>>>>> So, Lucas, which file was failing for you?
>>>>>> 
>>>>>> 
>>>>>> Regarding the experience, here it is what I gathered till now:
>>>>>> 
>>>>>> I created a very huge file_list.txt, containing the source files to
>>>>>> compile (I used 'find -name *.c*' in the juliet directory), then
>>>>>> filtered away the 'main\.c', 'main_linux\.c' and 'testcasesupport'
>>>>>> files (grep -v), which have nothing to check, then I sorted the list
>>>>>> by CWE number (I had to do manual sorting because I couldn't manage to
>>>>>> sort, for example, CWE15_* and CWE114_* correctly).
>>>>>> 
>>>>>> Since I can't check for win32-only tests (I'm using linux), I filtered
>>>>>> them via 'grep -v w32' and 'grep -v wchar_t' (some tests require a
>>>>>> 'fopen'-like function with wchar_t string, which seems to be exclusive
>>>>>> to win32).
>>>>>> 
>>>>>> 
>>>>>> Regarding the per-translation-unit analysis, some files are, indeed,
>>>>>> separated sources for a program, so I didn't hesitate to filter them
>>>>>> with these patterns, according to the manual: "[abcdeBG]\.c" and
>>>>>> "good1" (last was associated with a 'bad' file, which was already
>>>>>> filtered).
>>>>>> 
>>>>>> 
>>>>>> With this file_lists.txt, I run the static analyzer only for the false
>>>>>> positives, with this command:
>>>>>> 
>>>>>> < file_list.txt xargs -n 1 scan-build --use-analyzer /usr/bin/clang
>>>>>> -disable-checker deadcode.DeadStores -o buildres/ clang -c -I
>>>>>> testcasesupport -DOMITBAD -o /dev/null > /dev/null 2> warns.txt
>>>>>> 
>>>>>> 
>>>>>> so, it checks all the files in the file list, saves the results in
>>>>>> buildres and reports the warnings in warns.txt file, ignoring the
>>>>>> DeadStores warns because they're reported a lot often.
>>>>>> 
>>>>>> 
>>>>>> Well, there are tons of false positives, caused by the flow variants
>>>>>> which involve global and static variables, shadow variables usage,
>>>>>> etc.
>>>>>> 
>>>>>> 
>>>>>> To the devs, I'd like to know which CWE are you interested, from the
>>>>>> list I attached on that email:
>>>>>> http://lists.cs.uiuc.edu/pipermail/cfe-dev/2014-February/035113.html .
>>>>>> 
>>>>>> 
>>>>>> About the results, if I have more time, I'll post some of them.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 2014-02-18 2:05 GMT+01:00 Lucas Kanashiro
>>>>>> <kanashiro.duarte at gmail.com>:
>>>>>>        Thanks Jordan!
>>>>>> 
>>>>>>        Could you leave me updated on the matter? I am so interested
>>>>>>        in this,
>>>>>>        and if it is necessary and possible i want to help to solve
>>>>>>        the
>>>>>>        potential issue.
>>>>>> 
>>>>>>        Edward, can you tell us your experience with Clang and Juliet
>>>>>>        Test
>>>>>>        Suite?
>>>>>> 
>>>>>> 
>>>>>>        On Mon, 2014-02-17 at 09:43 -0800, Jordan Rose wrote:
>>>>>>> Hi, Lucas. The analyzer currently runs a
>>>>>>        per-translation-unit analysis, so it misses some bugs that
>>>>>>        whole-program analysis may be able to catch. I'm guessing
>>>>>>        that's the reason it's unable to catch this particular issue.
>>>>>>> 
>>>>>>> In general, the analyzer is set for reasonably fast
>>>>>>        turnaround (depending on the size of the project, of course),
>>>>>>        so it also might not do a fully precise interprocedural
>>>>>>        analysis if the state space gets too big. I'd have to see the
>>>>>>        particular test case to tell what's going on here.
>>>>>>> 
>>>>>>> I did see that Edward (CC'd) wanted to try bringing in the
>>>>>>        Juliet Test Suite for the analyzer, but neither I nor Ted (the
>>>>>>        lead on the analyzer) have gotten the chance to sit down and
>>>>>>        look at what this would actually entail. It's possible he's
>>>>>>        encountered similar issues, however.
>>>>>>> 
>>>>>>> Jordan
>>>>>>> 
>>>>>>> 
>>>>>>> On Feb 15, 2014, at 5:58 , Lucas Kanashiro
>>>>>>        <kanashiro.duarte at gmail.com> wrote:
>>>>>>> 
>>>>>>>> I am trying to running 'scan-build' in Juliet suite
>>>>>>        testcase v1.2 (NIST
>>>>>>>> indication) to catch some bugs of 'Division by zero' (CWE
>>>>>>        369) and I
>>>>>>>> can't do it, the scan-build can't show me the existing
>>>>>>        bugs. Did someone
>>>>>>>> try to do it yet?
>>>>>>>> 
>>>>>>>> I have a doubt that scan-build can identify a bug of
>>>>>>        division by zero in
>>>>>>>> this case (when parameter denominator is zero):
>>>>>>>> 
>>>>>>>> int divide (int denominator) {
>>>>>>>>    return 10/denominator;
>>>>>>>> }
>>>>>>>> 
>>>>>>>> Can someone help me? Is this a deficiency of scan-build?
>>>>>>        Can scan-build
>>>>>>>> identify the bugs in Juliet suite?
>>>>>>>> 
>>>>>>>> Thanks in advance!
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Lucas Kanashiro Duarte
>>>>>>>> Engenharia de Software - FGA/UnB
>>>>>>>> kanashiro.duarte at gmail.com
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> cfe-users mailing list
>>>>>>>> cfe-users at cs.uiuc.edu
>>>>>>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-users
>>>>>>> 
>>>>>> 
>>>>>>        --
>>>>>>        Lucas Kanashiro Duarte
>>>>>>        Engenharia de Software - FGA/UnB
>>>>>>        kanashiro.duarte at gmail.com
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Mathematics is the language with which God has written the universe.
>>>>>> (Galilei)
>>>>> 
>>>>> --
>>>>> Lucas Kanashiro Duarte
>>>>> Engenharia de Software - FGA/UnB
>>>>> kanashiro.duarte at gmail.com
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Mathematics is the language with which God has written the universe.
>>>> (Galilei)
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Mathematics is the language with which God has written the universe.
>>> (Galilei)
>> 
>> 
>> 
>> 
>> --
>> Mathematics is the language with which God has written the universe.
>> (Galilei)
>> 
>> 
> 
> 
> 
> -- 
> Mathematics is the language with which God has written the universe. (Galilei)