[PATCH] D35216: [analyzer] Escape symbols when creating std::initializer_list.

Mon Jul 10 11:48:34 PDT 2017

NoQ created this revision.

I've seen a few false positives that appear because we construct C++11 `std::initializer_list` objects with brace initializers, and such construction is not properly modeled. For instance, if a new object is constructed on the heap only to be put into a brace-initialized STL container, the object is reported to be leaked.

Approach (0): This can be trivially fixed by this patch, which causes pointers passed into initializer list expressions to immediately escape.

This fix is overly conservative though. So i did a bit of investigation as to how model `std::initializer_list` better.

According to the standard, `std::initializer_list<T>` is an object that has methods `begin()`, `end()`, and `size()`, where `begin()` returns a pointer to continous array of `size()` objects of type `T`, and `end()` is equal to `begin()` plus `size()`. The standard does hint that it should be possible to implement `std::initializer_list<T>` as a pair of pointers, or as a pointer and a size integer, however specific fields that the object would contain are an implementation detail.

Ideally, we should be able to model the initializer list's methods precisely. Or, at least, it should be possible to explain to the analyzer that the list somehow "takes hold" of the values put into it. Initializer lists can also be copied, which is a separate story that i'm not trying to address here.

The obvious approach to modeling `std::initializer_list` in a //checker// would be to construct a `SymbolMetadata` for the memory region of the initializer list object, which would be of type `T*` and represent `begin()`, so we'd trivially model `begin()` as a function that returns this symbol. The array pointed to by that symbol would be `bindLoc()`ed to contain the list's contents (probably as a `CompoundVal` to produce less bindings in the store). Extent of this array would represent `size()` and would be equal to the length of the list as written.

So this sounds good, however apparently it does nothing to address our false positives: when the list escapes, our `RegionStoreManager` is not magically guessing that the metadata symbol attached to it, together with its contents, should also escape. In fact, it's impossible to trigger a pointer escape from within the checker.

Approach (1): If only we enabled `ProgramState::bindLoc(..., notifyChanges=true)` to cause pointer escapes (not only region changes) (which sounds like the right thing to do anyway) such checker would be able to solve the false positives by triggering escapes when binding list elements to the list. However, it'd be as conservative as the current patch's solution. Ideally, we do not want escapes to happen so early. Instead, we'd prefer them to be delayed until the list itself escapes.

So i believe that escaping metadata symbols whenever their base regions escape would be the right thing to do. Currently we didn't think about that because we had neither pointer-type metadatas nor non-pointer escapes.

Approach (2): We could teach the Store to scan itself for bindings to metadata-symbolic-based regions during `scanReachableSymbols()` whenever a region turns out to be reachable. This requires no work on checker side, but it sounds performance-heavy.

Approach (3): We could let checkers maintain the set of active metadata symbols in the program state (ideally somewhere in the Store, which sounds weird but causes the smallest amount of layering violations), so that the core knew what to escape. This puts a stress on the checkers, but with a smart data map it wouldn't be a problem.

Approach (4): We could allow checkers to trigger pointer escapes in arbitrary moments. If we allow doing this within `checkPointerEscape` callback itself, we would be able to express facts like "when this region escapes, that metadata symbol attached to it should also escape". This sounds like an ultimate freedom, with maximum stress on the checkers - still not too much stress when we have smart data maps.

Does anybody like/hate any of these solutions or have anything better in mind? I'm personally liking the solution with `scanReachableSymbols()` - it should be possible to avoid performance overhead, and clarity seems nice.

https://reviews.llvm.org/D35216

Files:
  lib/StaticAnalyzer/Core/ExprEngine.cpp
  test/Analysis/initializer.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D35216.105875.patch
Type: text/x-patch
Size: 3607 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20170710/c4b72717/attachment.bin>