[cfe-dev] Analyzer: Extract region-walking behavior from CallAndMessageChecker
Ted Kremenek
kremenek at apple.com
Wed Jun 23 16:30:29 PDT 2010
On Jun 23, 2010, at 12:05 AM, Jordy Rose wrote:
> CallAndMessageChecker currently has a simple check to see if any of the
> members of a pass-by-value struct are uninitialized. It handles nested
> structs, but not structs containing arrays.
>
> I propose to extract this region-walking behavior out into a new class,
> RegionWalker. This would then be the basis for the current pass-by-value
> check and a possible new one (see below). My implementation of RegionWalker
> is loosely modeled after RecursiveASTVisitor (see attached patch), and
> supports both structs and arrays, and both nested.
Hi Jordy,
I think in principle the addition of RegionWalker is a nice refactoring, but I'm really concerned about the following:
+ const ElementRegion *ER = MemMgr.getElementRegion(EleTy, IndexVal, R, Ctx);
This call means that we will create a new ElementRegion for accessing every single element of an array. This is really expensive. Not only will this be really slow for large constant-sized arrays, it will cause a bunch of ElementRegion objects to get generated that will stay around for the entire lifetime of the GRExprEngine object.
As is, I don't think this can go in.
My high-level question is what is the purpose of this API? Is to iterate over "direct" bindings, or all the values in memory? The StoreManager does support an API to iterate over bindings, but the behavior will be highly dependent on the implementation of the StoreManager. We really need to design these APIs with algorithmic efficiency in mind. If we need to add more high-level APIs to StoreManager to make the kind of queries we want to do more efficient than I prefer we do that. RegionWalker right now defeats all the laziness of RegionStore by instantiation regions on-the-fly. This is going to be a huge performance killer.
Also, I'm not certain if extending the CallAndMessageChecker to include arrays will be all that useful. Consider an array field that holds an array of bytes. If that buffer holds a null-terminated string, with some of the remaining characters left uninitialized, that isn't a bug unless those values are actually accessed. Indeed I have already seen some false positives from this check when it comes to passing in structs to functions that contain fields that are not initialized but are not accessed. There is only really a bug here if the fields are accessed, so the CallAndMessageChecker may already be too aggressive. Also looking at arrays makes it more aggressive and expensive, and I'm not certain if the returns are worth it.
> The additional check I propose is for passing pointers to undefined values
> in an argument slot marked as "const". Clearly, the argument value is not
> useful, and for a const argument it can't be /set/ by the method either.
> (The exception would be "const void *", which is both the type of
> Objective-C "id" and sometimes an indicator that the pointer won't be
> dereferenced by the callee.)
Objective-C pointers are special-cased in the type system, so I'd restrict this to just C structs for now. It seems like a useful check that we can implement and then see how much useful results we get back (and how many false positives). We have to be careful with C++ classes and inheritance, e.g.:
class A { ... };
class B : public A { int uninit_field; ... };
...
void foo(const A *a);
...
B b;
foo(&b);
In this case, although 'b' contains an uninitialized field, it doesn't matter because 'b' is passed as an 'A*' to foo(). It is possible that foo() does a downcast to B*, but without knowledge about foo() we don't know.
> For the case of an array or struct (such as the source buffer for
> strcpy()), we have to be a little more cautious -- the pointer might be
> useful if /any/ of the fields are initialized. This is where RegionWalker
> would come in.
>
> But that's for a later patch.
>
> Suggestions, comments? Is this overkill? (Especially if the new check
> doesn't seem like a good idea.)
My main concern about CallAndMessageChecker's introspection of uninitialized struct fields (which I think I wrote) is that it is very approximate and I'm not convinced it has much added value. While passing uninitialized scalars into a function is unambiguously a bug, passing in a struct or array with uninitialized fields is not. In such cases, the best way to detect a bug is if we knew the called function accessed the given field or array element that was known to be uninitialized in the caller. Thus to really do this check right we probably need to build function summaries to detect when it is not okay to pass in an uninitialized field value.
More information about the cfe-dev
mailing list