[cfe-dev] [analyzer] C++ and inlining.

Sun Nov 26 03:10:10 PST 2017

I've observed a lot of false positives on WebKit that resulted from our 
inability to inline relatively simple functions in C++, specifically 
constructors and operator new(), due to unimplemented modeling. So i 
wanted to document, and eventually fix, a variety of language feature 
modeling we're currently lacking. Note that WebKit, like LLVM, is not a 
typical C++ codebase; it uses many language constructs that regular 
projects don't use, and uses very little standard library headers.

---

Normally we expect the analyzer to work even without any inlining, by 
conservatively invalidating the program state, which eliminates the 
clearly false assumptions but sometimes causes infeasible paths when the 
invalidation was too aggressive and we start denoting the same value 
with a different symbol after invalidation and assume contradictory 
stuff on the old and the new symbol. The false positives resulting from 
aggressive invalidation are usually treated as less scary than the ones 
suppressed by invalidation, because they can be easily suppressed (with 
assertions or const qualifiers etc.) without loss of coverage. However, 
in C++ there are a lot of implicit function calls which cause massive 
frustration when evaluated conservatively. For example,

   class C {
     bool haveX;
     class D d;

   public:
     C(int *x): haveX(x != 0) {
       if (haveX)
         *x = 1; // null dereference??
     }
   };

In this case, 'haveX' should assume that x is non-null. The code eagerly 
splits states into {x == 0} and {x != 0}, which seems reasonable. 
However, after 'haveX' have been initialized, the default constructor 
for field 'd' kicks in. If only this constructor is not inlined, or any 
of its callees are not inlined, value stored in 'haveX' would be 
invalidated on both paths. In particular, the path on which the original 
'haveX' is false but the invalidated 'haveX' is true have now opened up.

---

Inlining of the constructor itself is disabled in many cases for many 
reasons. In particular, we are currently only trying to inline the 
constructor when the target this-region is a DeclRegion (variable, 
member variable, Objective-C++ instance var), and the destructor is 
non-trivial. This cuts away a lot of cases:

* Constructing into temporaries is disabled when destructor is 
non-trivial. At least, we should be able to inline those when the 
destructor is present at all, so that it would be evaluated 
conservatively. One thing to note here is that our CFG has been recently 
fixed, so we're much closer to fixing this properly nowadays. However, 
CFG alone is not enough to figure out which destructor needs to be 
called; for instance, if a lifetime-extended temporary is initialized 
with an operator?: expression, we'd need path-sensitive information to 
figure out which object to destroy.

* Temporaries are also special because our pass-by-value is not always 
working correctly. In particular, when we inline 'foo(c)', where 
variable 'c' is of 'class C', we first copy 'c' into a temporary region, 
and then trivial-copy it into the stack variable-region of the function 
parameter, while we should be constructing directly into the parameter 
region. We cannot construct directly into the parameter region because 
the stack frame has not yet been constructed, because arguments are not 
yet computed. More reading on the subject, even if a bit outdated, is at 
http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html 
This is a hard problem, but i wasn't noticing many instances of it yet.

* Base object region constructors are disabled when destructor is 
non-trivial. This sounds like an accidental omission.

* Construction into ElementRegion is disabled regardless of destructors. 
In particular, mass array constructions are disabled. There is a special 
AST for this case, which emulates the loop through the array (or return 
value of operator new[]) with a loop counter variable and all. We have 
no support for this whole language construct. Note, however, that 
ElementRegion is much more than array element; it is often used for 
representing casts, and in particular it appears in return values of a 
conservatively evaluated operator new() (i.e. element{SymRegion}) and is 
likely to appear in placement-new() return values for the same reason. 
So we should discriminate between these two cases.

* Constructing into return value of operator new() is disabled 
completely anyway, because there's a modeling problem that causes us to 
be unable to connect the constructor with the correct this-region. The 
CFG part of this problem was fixed by Jordan in 2014 by adding the 
CFGNewAllocator element, so we now actually call the operator and the 
constructor in the correct order, but we still need to pass the operator 
new's return value to the constructor. Note how pointless it is to 
enable it, or even inline a custom operator new, until construction into 
ElementRegion is fixed.

---

Speaking of inlining operator new():

* For correct modeling in the core, it seems that the only thing that 
remains is to actually use the return value of operator new() and 
construct the new object *into* it. Sounds easy.

* I’m also immediately noticing the unimplemented path diagnostics 
within the inlined operator new(), which crash with 
llvm_unreachable(“not yet implemented”) whenever a visitor fires within 
it (eg. to indicate that memory allocated there leaks).

* MallocChecker keeps being surprised with various non-symbolic stuff 
that may come out from operator new(). https://reviews.llvm.org/D37189 
doesn’t cover all cases; in particular, void *operator new(size_t) { 
static int x; return &x; } seems to instantly crash after inlining.

---

 From now on i'd like to experiment with, first of all, disabling the 
DeclRegion bailout when possible. Then i'd try to inline operator new, 
pass its return value to the respective constructor, and inline said 
constructor. I'm not sure if i'd be able to dig into the temporaries and 
pass-by-copy problems soon.

And i believe that when it comes to C++ pure language constructs, i 
listed most of the problems i'm aware of, with the exception of, well, 
*exceptions* (which also need CFG work). Of course we could do better 
modeling the standard library as well.