[cfe-commits] Patch to do static analysis with range sets

Mon Feb 9 18:18:47 PST 2009

On Feb 8, 2009, at 6:57 AM, Ben Laurie wrote:

>> Cute.  Does a Range represent "[a, b]" or "(a, b)".  Please  
>> document this
>> with comments.
>>
>> I also take it that ranges are only over concrete values?  e.g.,  
>> [0, 2].
>> What about ranges where:
>>
>> (1) one of the ends is "symbolic", e.g. [a, 2]
>> (2) one of the ends is Unknown, e.g. [4, unknown].
>>
>> You don't have to model (1) now, but I think you probably will need  
>> to model
>> (2).  Frequently we have a lower bound but no upper bound (although I
>> imagine you could use maxint, minint, etc.).
>
> I already do - for example, != k is [minint, k-1], [k+1, maxint].

The class 'Range' only tracks concrete values (it references two  
APSInt objects).  A symbolic range consists of some number whose value  
is not actually known.  For example:

char foo(char *buf, int i, int length) {
   if (i < length) {
      // the range of i is [min_int, Value(length)-1]
      return buf[i];
   }

   // the range of i is [Value(length), max_int]
   return buf[i];
}

Here "Value(length)" represents the value bound to 'length' at the  
start of the function.  When doing static analysis, we consider its  
value to be symbolic to represent many possible values.  As we analyze  
a function, we gradually build up constraints for symbolic values, and  
we use these constraints to determine the feasibility of different  
paths (and ultimately, bugs).

Your implementation only tracks ranges over concrete values, which I  
think is a good start.  I'm only bringing up the issue of symbolic  
values now because I think that is what will be needed to do the kind  
of deeper security checking (e.g., buffer overflows) that you were  
interested in doing.