[cfe-dev] [analyzer] C++ and inlining.

Tue Nov 28 07:19:54 PST 2017

On 11/26/17 3:17 PM, Gábor Horváth wrote:
> Hi Artem!
>
> On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev 
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
>     I've observed a lot of false positives on WebKit that resulted
>     from our inability to inline relatively simple functions in C++,
>     specifically constructors and operator new(), due to unimplemented
>     modeling. So i wanted to document, and eventually fix, a variety
>     of language feature modeling we're currently lacking. Note that
>     WebKit, like LLVM, is not a typical C++ codebase; it uses many
>     language constructs that regular projects don't use, and uses very
>     little standard library headers.
>
>
> Great news!
>
>
>     ---
>
>     Normally we expect the analyzer to work even without any inlining,
>     by conservatively invalidating the program state, which eliminates
>     the clearly false assumptions but sometimes causes infeasible
>     paths when the invalidation was too aggressive and we start
>     denoting the same value with a different symbol after invalidation
>     and assume contradictory stuff on the old and the new symbol. The
>     false positives resulting from aggressive invalidation are usually
>     treated as less scary than the ones suppressed by invalidation,
>     because they can be easily suppressed (with assertions or const
>     qualifiers etc.) without loss of coverage. However, in C++ there
>     are a lot of implicit function calls which cause massive
>     frustration when evaluated conservatively. For example,
>
>       class C {
>         bool haveX;
>         class D d;
>
>       public:
>         C(int *x): haveX(x != 0) {
>           if (haveX)
>             *x = 1; // null dereference??
>         }
>       };
>
>     In this case, 'haveX' should assume that x is non-null. The code
>     eagerly splits states into {x == 0} and {x != 0}, which seems
>     reasonable. However, after 'haveX' have been initialized, the
>     default constructor for field 'd' kicks in. If only this
>     constructor is not inlined, or any of its callees are not inlined,
>     value stored in 'haveX' would be invalidated on both paths. In
>     particular, the path on which the original 'haveX' is false but
>     the invalidated 'haveX' is true have now opened up.
>
>
> I observed the very same pattern on Ericsson internal codebases, so 
> sorting this out is definitely a huge win for many projects (not just 
> LLVM and WebKit).
>
>
>     ---
>
>     Inlining of the constructor itself is disabled in many cases for
>     many reasons. In particular, we are currently only trying to
>     inline the constructor when the target this-region is a DeclRegion
>     (variable, member variable, Objective-C++ instance var), and the
>     destructor is non-trivial. This cuts away a lot of cases:
>
>
> It is not clear for my why the non-trivial destructor is a requirement 
> here. Do yo have any info on that?
>
>
>     * Constructing into temporaries is disabled when destructor is
>     non-trivial. At least, we should be able to inline those when the
>     destructor is present at all, so that it would be evaluated
>     conservatively. One thing to note here is that our CFG has been
>     recently fixed, so we're much closer to fixing this properly
>     nowadays. However, CFG alone is not enough to figure out which
>     destructor needs to be called; for instance, if a
>     lifetime-extended temporary is initialized with an operator?:
>     expression, we'd need path-sensitive information to figure out
>     which object to destroy.
>
>     * Temporaries are also special because our pass-by-value is not
>     always working correctly. In particular, when we inline 'foo(c)',
>     where variable 'c' is of 'class C', we first copy 'c' into a
>     temporary region, and then trivial-copy it into the stack
>     variable-region of the function parameter, while we should be
>     constructing directly into the parameter region. We cannot
>     construct directly into the parameter region because the stack
>     frame has not yet been constructed, because arguments are not yet
>     computed. More reading on the subject, even if a bit outdated, is
>     at
>     http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
>     <http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html>
>     This is a hard problem, but i wasn't noticing many instances of it
>     yet.
>
>
> I am wondering what the analyzer does with copy elision? Do we model 
> that somehow?
>
>
>     * Base object region constructors are disabled when destructor is
>     non-trivial. This sounds like an accidental omission.
>
>
> Nice catch.

Actually not so much. This bailout only fires for CK_Complete 
constructor kinds, and base-object constructors aren't "complete" in 
this sense, so they work fine.

This might still affect a complete constructor that follows 
placement-new into a base-object region (i'd be shocked to see code that 
actually does it).

>
>     * Construction into ElementRegion is disabled regardless of
>     destructors. In particular, mass array constructions are disabled.
>     There is a special AST for this case, which emulates the loop
>     through the array (or return value of operator new[]) with a loop
>     counter variable and all. We have no support for this whole
>     language construct. Note, however, that ElementRegion is much more
>     than array element; it is often used for representing casts, and
>     in particular it appears in return values of a conservatively
>     evaluated operator new() (i.e. element{SymRegion}) and is likely
>     to appear in placement-new() return values for the same reason. So
>     we should discriminate between these two cases.
>
>
> Do you know why casts are represented this way? Is this something that 
> we might want to change in the future? I think one of the reasons why 
> construction into ElementRegion is disabled because arrays are 
> populated lazily by the analyzer. And this does not work well with 
> code that relies on the fact that the number of constructor/destructor 
> invocations is the same as the number of elements in the array.
>
>
>     * Constructing into return value of operator new() is disabled
>     completely anyway, because there's a modeling problem that causes
>     us to be unable to connect the constructor with the correct
>     this-region. The CFG part of this problem was fixed by Jordan in
>     2014 by adding the CFGNewAllocator element, so we now actually
>     call the operator and the constructor in the correct order, but we
>     still need to pass the operator new's return value to the
>     constructor. Note how pointless it is to enable it, or even inline
>     a custom operator new, until construction into ElementRegion is fixed.
>
>     ---
>
>     Speaking of inlining operator new():
>
>     * For correct modeling in the core, it seems that the only thing
>     that remains is to actually use the return value of operator new()
>     and construct the new object *into* it. Sounds easy.
>
>     * I’m also immediately noticing the unimplemented path diagnostics
>     within the inlined operator new(), which crash with
>     llvm_unreachable(“not yet implemented”) whenever a visitor fires
>     within it (eg. to indicate that memory allocated there leaks).
>
>     * MallocChecker keeps being surprised with various non-symbolic
>     stuff that may come out from operator new().
>     https://reviews.llvm.org/D37189 <https://reviews.llvm.org/D37189>
>     doesn’t cover all cases; in particular, void *operator new(size_t)
>     { static int x; return &x; } seems to instantly crash after inlining.
>
>     ---
>
>     From now on i'd like to experiment with, first of all, disabling
>     the DeclRegion bailout when possible. Then i'd try to inline
>     operator new, pass its return value to the respective constructor,
>     and inline said constructor. I'm not sure if i'd be able to dig
>     into the temporaries and pass-by-copy problems soon.
>
>     And i believe that when it comes to C++ pure language constructs,
>     i listed most of the problems i'm aware of, with the exception of,
>     well, *exceptions* (which also need CFG work). Of course we could
>     do better modeling the standard library as well.
>
>
> Thank you for collecting these problems!
>
>     _______________________________________________
>     cfe-dev mailing list
>     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>