[cfe-dev] Static Analyzer Rocks Hard

Wed Jun 25 09:49:06 PDT 2008

On Jun 24, 2008, at 12:04 AM, Holger Schurig wrote:

>> The more complete way to catch these bugs (and potentially
>> verify their absence) is to flag dangerous uses of untrusted
>> data: using it as a size parameter to malloc, using it as an
>> array index, and so on.
>
> It would be cool if, e.g. at an checker-level, a variable or
> memory object could have something like the perl "taint" bit.
>
> http://www.webreference.com/programming/perl/taint/
>
> In perl, you untaint via a regexp. In checker, you might untaint
> by checking a variable, e.g. for upper/lower bounds (signed) or
> upper bounds only (unsigned variable).
>
> If you then use the tainted variable to system function (how do
> we define this?), you could get a tainted warning from the
> checker.

This indeed would be a useful check, and it is something I would like  
to have implemented one day as part of the static analyzer.

There has been a variety of work on doing taint analysis on C  
programs, and there are different kinds of taint properties to check.   
The kind of checking you mentioned has been before for C (in a  
research tool) and was demonstrated to be very useful:

   Using Programmer-Written Compiler Extensions to Catch Security Holes
   http://www.stanford.edu/~engler/sp-ieee-02.pdf

Another kind of "taint property" is tracking the use of kernel/user  
pointers in kernel space; this is more of an address-space qualifier  
problem, but it can also be viewed as a form of taint propagation.

There have been a variety of proposals of how to define sources of  
tainted data, and what sinks (functions) cannot take tainted data.   
One standard approach is to use annotations on function prototypes,  
which we could do in the form of attributes. This approach has  
actually been used in the Linux kernel to annotated user vs. kernel  
pointers.  Of course simply having an external list of well-known  
sources of tainted data that could be fed to the static analyzer would  
also be useful.

Eventually, once a framework for doing inter-procedural analysis is in  
place in clang, we could potentially relax taint attributes across  
procedure boundaries.  A good example of this is MECA (another  
research tool):

   MECA: an Extensible, Expressive System and Language for Statically  
Checking Security Properties
   http://www.stanford.edu/~engler/ccs03-meca.pdf

There are of course many examples of other systems that do taint  
propagation (with potentially more analysis sophistication), but these  
are a couple of good examples.