[cfe-commits] [PATCH] Set region size in GRRegionVals transfer function
Ted Kremenek
kremenek at apple.com
Fri Nov 7 09:01:17 PST 2008
On Nov 7, 2008, at 5:34 AM, Zhongxing Xu wrote:
> Attached is a scratch implementation of array bound checking. It
> seems that implementing array bound checking is pretty
> straightforward. No need to make big change to the framework.
> <oob.patch>
Wow! Great work. This is very simple. This is a very good start. I
think we can apply it now and then iterate.
One thing that I think is worth mentioning is that not all MemRegions
represent chunks of memory for which we want to enforce strict
software segmentation. Consider the following code:
struct S {
unsigned length;
char buf[];
};
unsigned getLength(char* data) {
struct S* s = (struct S*) (data - offsetof(struct S, buf));
return s->length;
}
I may not have written this correctly, but I have seen code like this
before in some large open source C project (I don't remember which).
I'm not certain if such code poses a challenge for out-of-region
checking, as conceptually we might represent the above code with
several regions layered on top of each other.
For example, the 'data' argument could be a pointer to a MemRegion
representing a character array with an extent of $extent(data) (I'm
using '$' to represent a symbolic value). This extent would be a
subregion of a region representing a 'struct S' object. Through
pointer arithmetic, we get a pointer value that refers to the front of
the 'struct S' object. The subsequent access through 's->length'
first causes us to get the FieldRegion for 'length' of 's' and then
perform the load (which works as expected).
I know this may seem like a contrived example, and it might not be
something we should even care about right now, but I thought it would
bring it up.
Another use of MemRegions that I thought of was bit-level typing
(which I'm not saying we should implement right now, or ever). For
example, suppose you have an unsigned integer variable whose bits are
used as flags. While some programmers may use bit fields for this
task, others just use shifts and masks. MemRegions provide a nice
abstraction to segment out the individual bits of an integer, allowing
us to potentially perform bit-level typing (http://portal.acm.org/citation.cfm?doid=1181775.1181791
) or simply have better symbolic reasoning for bit values. While one
cannot take the address of a bit, one can take the address of specific
words, compute the address of related words using pointer arithmetic,
etc., and everything is fine (no out-of-region access errors).
My meta point here is that we probably need a simple interface in
StoreManager to determine if accessing beyond the bounds of an extent
is okay. Sometimes it is okay to access out of the current region as
long as it doesn't exceed the bounds of some ancestor region (which
represents the allocated buffer, for example). Perhaps that means we
need to reason about "canonical locations" (similar to how
SourceManager reasons about logical and physical locations), where
canonical locations could represent the byte location within a chunk
of segmented memory. I honestly don't know.
I don't think these details are high on the priority list of things to
worry about, but I thought I would mention them now before we hardwire
our implementation of array-bounds checking with too many assumptions.
Ted
More information about the cfe-commits
mailing list