[cfe-dev] [analyzer] C++ and inlining.
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Tue Nov 28 07:19:54 PST 2017
On 11/26/17 3:17 PM, Gábor Horváth wrote:
> Hi Artem!
>
> On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev
> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>
> I've observed a lot of false positives on WebKit that resulted
> from our inability to inline relatively simple functions in C++,
> specifically constructors and operator new(), due to unimplemented
> modeling. So i wanted to document, and eventually fix, a variety
> of language feature modeling we're currently lacking. Note that
> WebKit, like LLVM, is not a typical C++ codebase; it uses many
> language constructs that regular projects don't use, and uses very
> little standard library headers.
>
>
> Great news!
>
>
> ---
>
> Normally we expect the analyzer to work even without any inlining,
> by conservatively invalidating the program state, which eliminates
> the clearly false assumptions but sometimes causes infeasible
> paths when the invalidation was too aggressive and we start
> denoting the same value with a different symbol after invalidation
> and assume contradictory stuff on the old and the new symbol. The
> false positives resulting from aggressive invalidation are usually
> treated as less scary than the ones suppressed by invalidation,
> because they can be easily suppressed (with assertions or const
> qualifiers etc.) without loss of coverage. However, in C++ there
> are a lot of implicit function calls which cause massive
> frustration when evaluated conservatively. For example,
>
> class C {
> bool haveX;
> class D d;
>
> public:
> C(int *x): haveX(x != 0) {
> if (haveX)
> *x = 1; // null dereference??
> }
> };
>
> In this case, 'haveX' should assume that x is non-null. The code
> eagerly splits states into {x == 0} and {x != 0}, which seems
> reasonable. However, after 'haveX' have been initialized, the
> default constructor for field 'd' kicks in. If only this
> constructor is not inlined, or any of its callees are not inlined,
> value stored in 'haveX' would be invalidated on both paths. In
> particular, the path on which the original 'haveX' is false but
> the invalidated 'haveX' is true have now opened up.
>
>
> I observed the very same pattern on Ericsson internal codebases, so
> sorting this out is definitely a huge win for many projects (not just
> LLVM and WebKit).
>
>
> ---
>
> Inlining of the constructor itself is disabled in many cases for
> many reasons. In particular, we are currently only trying to
> inline the constructor when the target this-region is a DeclRegion
> (variable, member variable, Objective-C++ instance var), and the
> destructor is non-trivial. This cuts away a lot of cases:
>
>
> It is not clear for my why the non-trivial destructor is a requirement
> here. Do yo have any info on that?
>
>
> * Constructing into temporaries is disabled when destructor is
> non-trivial. At least, we should be able to inline those when the
> destructor is present at all, so that it would be evaluated
> conservatively. One thing to note here is that our CFG has been
> recently fixed, so we're much closer to fixing this properly
> nowadays. However, CFG alone is not enough to figure out which
> destructor needs to be called; for instance, if a
> lifetime-extended temporary is initialized with an operator?:
> expression, we'd need path-sensitive information to figure out
> which object to destroy.
>
> * Temporaries are also special because our pass-by-value is not
> always working correctly. In particular, when we inline 'foo(c)',
> where variable 'c' is of 'class C', we first copy 'c' into a
> temporary region, and then trivial-copy it into the stack
> variable-region of the function parameter, while we should be
> constructing directly into the parameter region. We cannot
> construct directly into the parameter region because the stack
> frame has not yet been constructed, because arguments are not yet
> computed. More reading on the subject, even if a bit outdated, is
> at
> http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
> <http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html>
> This is a hard problem, but i wasn't noticing many instances of it
> yet.
>
>
> I am wondering what the analyzer does with copy elision? Do we model
> that somehow?
>
>
> * Base object region constructors are disabled when destructor is
> non-trivial. This sounds like an accidental omission.
>
>
> Nice catch.
Actually not so much. This bailout only fires for CK_Complete
constructor kinds, and base-object constructors aren't "complete" in
this sense, so they work fine.
This might still affect a complete constructor that follows
placement-new into a base-object region (i'd be shocked to see code that
actually does it).
>
> * Construction into ElementRegion is disabled regardless of
> destructors. In particular, mass array constructions are disabled.
> There is a special AST for this case, which emulates the loop
> through the array (or return value of operator new[]) with a loop
> counter variable and all. We have no support for this whole
> language construct. Note, however, that ElementRegion is much more
> than array element; it is often used for representing casts, and
> in particular it appears in return values of a conservatively
> evaluated operator new() (i.e. element{SymRegion}) and is likely
> to appear in placement-new() return values for the same reason. So
> we should discriminate between these two cases.
>
>
> Do you know why casts are represented this way? Is this something that
> we might want to change in the future? I think one of the reasons why
> construction into ElementRegion is disabled because arrays are
> populated lazily by the analyzer. And this does not work well with
> code that relies on the fact that the number of constructor/destructor
> invocations is the same as the number of elements in the array.
>
>
> * Constructing into return value of operator new() is disabled
> completely anyway, because there's a modeling problem that causes
> us to be unable to connect the constructor with the correct
> this-region. The CFG part of this problem was fixed by Jordan in
> 2014 by adding the CFGNewAllocator element, so we now actually
> call the operator and the constructor in the correct order, but we
> still need to pass the operator new's return value to the
> constructor. Note how pointless it is to enable it, or even inline
> a custom operator new, until construction into ElementRegion is fixed.
>
> ---
>
> Speaking of inlining operator new():
>
> * For correct modeling in the core, it seems that the only thing
> that remains is to actually use the return value of operator new()
> and construct the new object *into* it. Sounds easy.
>
> * I’m also immediately noticing the unimplemented path diagnostics
> within the inlined operator new(), which crash with
> llvm_unreachable(“not yet implemented”) whenever a visitor fires
> within it (eg. to indicate that memory allocated there leaks).
>
> * MallocChecker keeps being surprised with various non-symbolic
> stuff that may come out from operator new().
> https://reviews.llvm.org/D37189 <https://reviews.llvm.org/D37189>
> doesn’t cover all cases; in particular, void *operator new(size_t)
> { static int x; return &x; } seems to instantly crash after inlining.
>
> ---
>
> From now on i'd like to experiment with, first of all, disabling
> the DeclRegion bailout when possible. Then i'd try to inline
> operator new, pass its return value to the respective constructor,
> and inline said constructor. I'm not sure if i'd be able to dig
> into the temporaries and pass-by-copy problems soon.
>
> And i believe that when it comes to C++ pure language constructs,
> i listed most of the problems i'm aware of, with the exception of,
> well, *exceptions* (which also need CFG work). Of course we could
> do better modeling the standard library as well.
>
>
> Thank you for collecting these problems!
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>
>
More information about the cfe-dev
mailing list