[cfe-commits] [PATCH] Set region size in GRRegionVals transfer function

Ted Kremenek kremenek at apple.com
Fri Nov 7 09:01:17 PST 2008


On Nov 7, 2008, at 5:34 AM, Zhongxing Xu wrote:

> Attached is a scratch implementation of array bound checking. It  
> seems that implementing array bound checking is pretty  
> straightforward. No need to make big change to the framework.
> <oob.patch>

Wow!  Great work.  This is very simple.  This is a very good start.  I  
think we can apply it now and then iterate.

One thing that I think is worth mentioning is that not all MemRegions  
represent chunks of memory for which we want to enforce strict  
software segmentation.  Consider the following code:

struct S {
   unsigned length;
   char buf[];
};

unsigned getLength(char* data) {
   struct S* s = (struct S*) (data - offsetof(struct S, buf));
   return s->length;
}

I may not have written this correctly, but I have seen code like this  
before in some large open source C project (I don't remember which).   
I'm not certain if such code poses a challenge for out-of-region  
checking, as conceptually we might represent the above code with  
several regions layered on top of each other.

For example, the 'data' argument could be a pointer to a MemRegion  
representing a character array with an extent of $extent(data) (I'm  
using '$' to represent a symbolic value).  This extent would be a  
subregion of a region representing a 'struct S' object.  Through  
pointer arithmetic, we get a pointer value that refers to the front of  
the 'struct S' object.  The subsequent access through 's->length'  
first causes us to get the FieldRegion for 'length' of 's' and then  
perform the load (which works as expected).

I know this may seem like a contrived example, and it might not be  
something we should even care about right now, but I thought it would  
bring it up.

Another use of MemRegions that I thought of was bit-level typing  
(which I'm not saying we should implement right now, or ever).  For  
example, suppose you have an unsigned integer variable whose bits are  
used as flags.  While some programmers may use bit fields for this  
task, others just use shifts and masks.  MemRegions provide a nice  
abstraction to segment out the individual bits of an integer, allowing  
us to potentially perform bit-level typing (http://portal.acm.org/citation.cfm?doid=1181775.1181791 
) or simply have better symbolic reasoning for bit values.  While one  
cannot take the address of a bit, one can take the address of specific  
words, compute the address of related words using pointer arithmetic,  
etc., and everything is fine (no out-of-region access errors).

My meta point here is that we probably need a simple interface in  
StoreManager to determine if accessing beyond the bounds of an extent  
is okay.  Sometimes it is okay to access out of the current region as  
long as it doesn't exceed the bounds of some ancestor region (which  
represents the allocated buffer, for example).  Perhaps that means we  
need to reason about "canonical locations" (similar to how  
SourceManager reasons about logical and physical locations), where  
canonical locations could represent the byte location within a chunk  
of segmented memory.  I honestly don't know.

I don't think these details are high on the priority list of things to  
worry about, but I thought I would mention them now before we hardwire  
our implementation of array-bounds checking with too many assumptions.

Ted



More information about the cfe-commits mailing list