[PATCH] D35216: [analyzer] Escape symbols when creating std::initializer_list.

Wed Aug 30 03:22:01 PDT 2017

NoQ added a comment.

In https://reviews.llvm.org/D35216#856093, @rsmith wrote:

> The `CXXStdInitializerListExpr` node has pretty simple evaluation semantics: it takes a glvalue array expression, and constructs a `std::initializer_list<T>` from it as if by filling in the two members with a pointer to the array and either the size of the array or a pointer to its end. Can we not model those semantics directly here?

Yeah, it's definitely an easy case to perform complete modeling. However, in the analyzer we still lack some infrastructure for C++ library modeling, which was discussed quite a lot above, and i feel we've run out of debating time and still need to fix the false positive. It's important to make our checker APIs more powerful before we jump into modeling myriads of STL classes. The main problem is how to represent the private fields in the `std::initializer_list<T>` object in our symbolic memory model. Here's a quick summary of the above.

We already have the object's fields in the AST, with its AST record layout, however field names and layouts are implementation-dependent, and it is unsafe to try to understand how the specific standard library implementation's layout works and what specific private fields we see in the AST mean. Instead it seems safer to attach metadata to the object, which would contain values of these private fields - this is a common operation for the analyzer (eg. attach string length metadata to a null-terminated C-string).

However, because such metadata should be handled by an analyzer's checker (we cannot possibly model all APIs in the core, moving it into checkers is much more scalable and natural), and therefore such metadata would be private to the checker in one way or another, it makes checkers fully responsible for modeling such metadata. In particular, the very feature lack of which that was causing the false positive (expressing the fact that if the list is passed into an external function (eg. with no visible body available for analysis), objects stored in it are also "escaping" into a function and can be, for example, deallocated here, so other checkers need to be notified to stop tracking them) cannot currently be implemented on the checker side. If we let checkers express this, it'd require an additional checker callback boilerplate that would need to be repeated in every C++ checker, and we already have a lot of boilerplate. Additionally, when constructing the `initializer_list`'s elements, we already need to have the storage for them, which is hard to handle by the core when such storage is checker-specific.

Another solution would be to let the analyzer core help checkers with the simple boilerplate, which is a long-standing project that is to be took up. The only downside of this approach is that it might make checker APIs less omnipotent, because they can no longer maintain their metadata in arbitrary manners, but only in manners that are understandable by the core.

An alternative solution described here was to let the checkers produce "imaginary" fields within our symbolic memory model, with unknown offsets within the object (similar to objective-c instance variables) that would have benefits of no longer being implementation-defined. The difference with maintaining metadata is that most of the boilerplate would be handled by the memory model (RegionStore) rather than the checker, and the core would generally know that the pointers to the array are located within the object. However, this also raises multiple concerns, because such imaginary "ghost" fields may conflict with the real fields in weird manners, they may accidentally leak to user's diagnostics, and what not. And we still have the problem with checker-specific storage.

So these are the excuses :)

https://reviews.llvm.org/D35216