[cfe-dev] [analyzer] C++ and inlining.

Sun Dec 3 12:56:58 PST 2017

Speaking of the code sample from the first e-mail: does anyone has an
explanation why it is the non-inlined constructor being a special
case? If you replace with an arbitrary non-inlined function call
between the haveX initialization and the "if" condition the analysis
seem to work fine. Compare assembly in [1] with the "D d;" line
commented out and active.

[1] https://godbolt.org/g/jmSKT9

Alexey

On Tue, Nov 28, 2017 at 6:19 PM, Artem Dergachev via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
>
>
> On 11/26/17 3:17 PM, Gábor Horváth wrote:
>>
>> Hi Artem!
>>
>>
>> On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev
>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>
>>     I've observed a lot of false positives on WebKit that resulted
>>     from our inability to inline relatively simple functions in C++,
>>     specifically constructors and operator new(), due to unimplemented
>>     modeling. So i wanted to document, and eventually fix, a variety
>>     of language feature modeling we're currently lacking. Note that
>>     WebKit, like LLVM, is not a typical C++ codebase; it uses many
>>     language constructs that regular projects don't use, and uses very
>>     little standard library headers.
>>
>>
>> Great news!
>>
>>
>>     ---
>>
>>     Normally we expect the analyzer to work even without any inlining,
>>     by conservatively invalidating the program state, which eliminates
>>     the clearly false assumptions but sometimes causes infeasible
>>     paths when the invalidation was too aggressive and we start
>>     denoting the same value with a different symbol after invalidation
>>     and assume contradictory stuff on the old and the new symbol. The
>>     false positives resulting from aggressive invalidation are usually
>>     treated as less scary than the ones suppressed by invalidation,
>>     because they can be easily suppressed (with assertions or const
>>     qualifiers etc.) without loss of coverage. However, in C++ there
>>     are a lot of implicit function calls which cause massive
>>     frustration when evaluated conservatively. For example,
>>
>>       class C {
>>         bool haveX;
>>         class D d;
>>
>>       public:
>>         C(int *x): haveX(x != 0) {
>>           if (haveX)
>>             *x = 1; // null dereference??
>>         }
>>       };
>>
>>     In this case, 'haveX' should assume that x is non-null. The code
>>     eagerly splits states into {x == 0} and {x != 0}, which seems
>>     reasonable. However, after 'haveX' have been initialized, the
>>     default constructor for field 'd' kicks in. If only this
>>     constructor is not inlined, or any of its callees are not inlined,
>>     value stored in 'haveX' would be invalidated on both paths. In
>>     particular, the path on which the original 'haveX' is false but
>>     the invalidated 'haveX' is true have now opened up.
>>
>>
>> I observed the very same pattern on Ericsson internal codebases, so
>> sorting this out is definitely a huge win for many projects (not just LLVM
>> and WebKit).
>>
>>
>>     ---
>>
>>     Inlining of the constructor itself is disabled in many cases for
>>     many reasons. In particular, we are currently only trying to
>>     inline the constructor when the target this-region is a DeclRegion
>>     (variable, member variable, Objective-C++ instance var), and the
>>     destructor is non-trivial. This cuts away a lot of cases:
>>
>>
>> It is not clear for my why the non-trivial destructor is a requirement
>> here. Do yo have any info on that?
>>
>>
>>     * Constructing into temporaries is disabled when destructor is
>>     non-trivial. At least, we should be able to inline those when the
>>     destructor is present at all, so that it would be evaluated
>>     conservatively. One thing to note here is that our CFG has been
>>     recently fixed, so we're much closer to fixing this properly
>>     nowadays. However, CFG alone is not enough to figure out which
>>     destructor needs to be called; for instance, if a
>>     lifetime-extended temporary is initialized with an operator?:
>>     expression, we'd need path-sensitive information to figure out
>>     which object to destroy.
>>
>>     * Temporaries are also special because our pass-by-value is not
>>     always working correctly. In particular, when we inline 'foo(c)',
>>     where variable 'c' is of 'class C', we first copy 'c' into a
>>     temporary region, and then trivial-copy it into the stack
>>     variable-region of the function parameter, while we should be
>>     constructing directly into the parameter region. We cannot
>>     construct directly into the parameter region because the stack
>>     frame has not yet been constructed, because arguments are not yet
>>     computed. More reading on the subject, even if a bit outdated, is
>>     at
>>
>> http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
>>
>> <http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html>
>>     This is a hard problem, but i wasn't noticing many instances of it
>>     yet.
>>
>>
>> I am wondering what the analyzer does with copy elision? Do we model that
>> somehow?
>>
>>
>>     * Base object region constructors are disabled when destructor is
>>     non-trivial. This sounds like an accidental omission.
>>
>>
>> Nice catch.
>
>
> Actually not so much. This bailout only fires for CK_Complete constructor
> kinds, and base-object constructors aren't "complete" in this sense, so they
> work fine.
>
> This might still affect a complete constructor that follows placement-new
> into a base-object region (i'd be shocked to see code that actually does
> it).
>
>>
>>     * Construction into ElementRegion is disabled regardless of
>>     destructors. In particular, mass array constructions are disabled.
>>     There is a special AST for this case, which emulates the loop
>>     through the array (or return value of operator new[]) with a loop
>>     counter variable and all. We have no support for this whole
>>     language construct. Note, however, that ElementRegion is much more
>>     than array element; it is often used for representing casts, and
>>     in particular it appears in return values of a conservatively
>>     evaluated operator new() (i.e. element{SymRegion}) and is likely
>>     to appear in placement-new() return values for the same reason. So
>>     we should discriminate between these two cases.
>>
>>
>> Do you know why casts are represented this way? Is this something that we
>> might want to change in the future? I think one of the reasons why
>> construction into ElementRegion is disabled because arrays are populated
>> lazily by the analyzer. And this does not work well with code that relies on
>> the fact that the number of constructor/destructor invocations is the same
>> as the number of elements in the array.
>>
>>
>>     * Constructing into return value of operator new() is disabled
>>     completely anyway, because there's a modeling problem that causes
>>     us to be unable to connect the constructor with the correct
>>     this-region. The CFG part of this problem was fixed by Jordan in
>>     2014 by adding the CFGNewAllocator element, so we now actually
>>     call the operator and the constructor in the correct order, but we
>>     still need to pass the operator new's return value to the
>>     constructor. Note how pointless it is to enable it, or even inline
>>     a custom operator new, until construction into ElementRegion is fixed.
>>
>>     ---
>>
>>     Speaking of inlining operator new():
>>
>>     * For correct modeling in the core, it seems that the only thing
>>     that remains is to actually use the return value of operator new()
>>     and construct the new object *into* it. Sounds easy.
>>
>>     * I’m also immediately noticing the unimplemented path diagnostics
>>     within the inlined operator new(), which crash with
>>     llvm_unreachable(“not yet implemented”) whenever a visitor fires
>>     within it (eg. to indicate that memory allocated there leaks).
>>
>>     * MallocChecker keeps being surprised with various non-symbolic
>>     stuff that may come out from operator new().
>>     https://reviews.llvm.org/D37189 <https://reviews.llvm.org/D37189>
>>     doesn’t cover all cases; in particular, void *operator new(size_t)
>>     { static int x; return &x; } seems to instantly crash after inlining.
>>
>>     ---
>>
>>     From now on i'd like to experiment with, first of all, disabling
>>     the DeclRegion bailout when possible. Then i'd try to inline
>>     operator new, pass its return value to the respective constructor,
>>     and inline said constructor. I'm not sure if i'd be able to dig
>>     into the temporaries and pass-by-copy problems soon.
>>
>>     And i believe that when it comes to C++ pure language constructs,
>>     i listed most of the problems i'm aware of, with the exception of,
>>     well, *exceptions* (which also need CFG work). Of course we could
>>     do better modeling the standard library as well.
>>
>>
>> Thank you for collecting these problems!
>>
>>     _______________________________________________
>>     cfe-dev mailing list
>>     cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>     http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>     <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev