[cfe-dev] [analyzer] C++ and inlining.

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Tue Dec 5 15:22:11 PST 2017

On 11/26/17 3:10 AM, Artem Dergachev wrote:
> I've observed a lot of false positives on WebKit that resulted from 
> our inability to inline relatively simple functions in C++, 
> specifically constructors and operator new(), due to unimplemented 
> modeling. So i wanted to document, and eventually fix, a variety of 
> language feature modeling we're currently lacking. Note that WebKit, 
> like LLVM, is not a typical C++ codebase; it uses many language 
> constructs that regular projects don't use, and uses very little 
> standard library headers.
> ---
> Normally we expect the analyzer to work even without any inlining, by 
> conservatively invalidating the program state, which eliminates the 
> clearly false assumptions but sometimes causes infeasible paths when 
> the invalidation was too aggressive and we start denoting the same 
> value with a different symbol after invalidation and assume 
> contradictory stuff on the old and the new symbol. The false positives 
> resulting from aggressive invalidation are usually treated as less 
> scary than the ones suppressed by invalidation, because they can be 
> easily suppressed (with assertions or const qualifiers etc.) without 
> loss of coverage. However, in C++ there are a lot of implicit function 
> calls which cause massive frustration when evaluated conservatively. 
> For example,
>   class C {
>     bool haveX;
>     class D d;
>   public:
>     C(int *x): haveX(x != 0) {
>       if (haveX)
>         *x = 1; // null dereference??
>     }
>   };
> In this case, 'haveX' should assume that x is non-null. The code 
> eagerly splits states into {x == 0} and {x != 0}, which seems 
> reasonable. However, after 'haveX' have been initialized, the default 
> constructor for field 'd' kicks in. If only this constructor is not 
> inlined, or any of its callees are not inlined, value stored in 
> 'haveX' would be invalidated on both paths. In particular, the path on 
> which the original 'haveX' is false but the invalidated 'haveX' is 
> true have now opened up.
> ---
> Inlining of the constructor itself is disabled in many cases for many 
> reasons. In particular, we are currently only trying to inline the 
> constructor when the target this-region is a DeclRegion (variable, 
> member variable, Objective-C++ instance var), and the destructor is 
> non-trivial. This cuts away a lot of cases:
> * Constructing into temporaries is disabled when destructor is 
> non-trivial. At least, we should be able to inline those when the 
> destructor is present at all, so that it would be evaluated 
> conservatively. One thing to note here is that our CFG has been 
> recently fixed, so we're much closer to fixing this properly nowadays. 
> However, CFG alone is not enough to figure out which destructor needs 
> to be called; for instance, if a lifetime-extended temporary is 
> initialized with an operator?: expression, we'd need path-sensitive 
> information to figure out which object to destroy.
> * Temporaries are also special because our pass-by-value is not always 
> working correctly. In particular, when we inline 'foo(c)', where 
> variable 'c' is of 'class C', we first copy 'c' into a temporary 
> region, and then trivial-copy it into the stack variable-region of the 
> function parameter, while we should be constructing directly into the 
> parameter region. We cannot construct directly into the parameter 
> region because the stack frame has not yet been constructed, because 
> arguments are not yet computed. More reading on the subject, even if a 
> bit outdated, is at 
> http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html 
> This is a hard problem, but i wasn't noticing many instances of it yet.
> * Base object region constructors are disabled when destructor is 
> non-trivial. This sounds like an accidental omission.
> * Construction into ElementRegion is disabled regardless of 
> destructors. In particular, mass array constructions are disabled. 
> There is a special AST for this case, which emulates the loop through 
> the array (or return value of operator new[]) with a loop counter 
> variable and all. We have no support for this whole language 
> construct. Note, however, that ElementRegion is much more than array 
> element; it is often used for representing casts, and in particular it 
> appears in return values of a conservatively evaluated operator new() 
> (i.e. element{SymRegion}) and is likely to appear in placement-new() 
> return values for the same reason. So we should discriminate between 
> these two cases.
> * Constructing into return value of operator new() is disabled 
> completely anyway, because there's a modeling problem that causes us 
> to be unable to connect the constructor with the correct this-region. 
> The CFG part of this problem was fixed by Jordan in 2014 by adding the 
> CFGNewAllocator element, so we now actually call the operator and the 
> constructor in the correct order, but we still need to pass the 
> operator new's return value to the constructor. Note how pointless it 
> is to enable it, or even inline a custom operator new, until 
> construction into ElementRegion is fixed.

One more syntax pattern that we don't support was unveiled in 
https://reviews.llvm.org/D40841 : construction as part of a bigger 
aggregate initialization would be never inlined, neither it is performed 
with a correct this-region. This includes cases like

   struct A { A(); }; // not an aggregate, has constructor.
   struct B { A a; }; // aggregate, no constructor needed.
   B b = {}; // A() is not handled correctly here.

Or, since C++17, also

   struct B: public A {};

works similarly.

> ---
> Speaking of inlining operator new():
> * For correct modeling in the core, it seems that the only thing that 
> remains is to actually use the return value of operator new() and 
> construct the new object *into* it. Sounds easy.
> * I’m also immediately noticing the unimplemented path diagnostics 
> within the inlined operator new(), which crash with 
> llvm_unreachable(“not yet implemented”) whenever a visitor fires 
> within it (eg. to indicate that memory allocated there leaks).
> * MallocChecker keeps being surprised with various non-symbolic stuff 
> that may come out from operator new(). https://reviews.llvm.org/D37189 
> doesn’t cover all cases; in particular, void *operator new(size_t) { 
> static int x; return &x; } seems to instantly crash after inlining.
> ---
> From now on i'd like to experiment with, first of all, disabling the 
> DeclRegion bailout when possible. Then i'd try to inline operator new, 
> pass its return value to the respective constructor, and inline said 
> constructor. I'm not sure if i'd be able to dig into the temporaries 
> and pass-by-copy problems soon.
> And i believe that when it comes to C++ pure language constructs, i 
> listed most of the problems i'm aware of, with the exception of, well, 
> *exceptions* (which also need CFG work). Of course we could do better 
> modeling the standard library as well.

More information about the cfe-dev mailing list