[cfe-dev] [analyzer] C++ and inlining.
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Tue Dec 5 15:22:11 PST 2017
On 11/26/17 3:10 AM, Artem Dergachev wrote:
> I've observed a lot of false positives on WebKit that resulted from
> our inability to inline relatively simple functions in C++,
> specifically constructors and operator new(), due to unimplemented
> modeling. So i wanted to document, and eventually fix, a variety of
> language feature modeling we're currently lacking. Note that WebKit,
> like LLVM, is not a typical C++ codebase; it uses many language
> constructs that regular projects don't use, and uses very little
> standard library headers.
>
> ---
>
> Normally we expect the analyzer to work even without any inlining, by
> conservatively invalidating the program state, which eliminates the
> clearly false assumptions but sometimes causes infeasible paths when
> the invalidation was too aggressive and we start denoting the same
> value with a different symbol after invalidation and assume
> contradictory stuff on the old and the new symbol. The false positives
> resulting from aggressive invalidation are usually treated as less
> scary than the ones suppressed by invalidation, because they can be
> easily suppressed (with assertions or const qualifiers etc.) without
> loss of coverage. However, in C++ there are a lot of implicit function
> calls which cause massive frustration when evaluated conservatively.
> For example,
>
> class C {
> bool haveX;
> class D d;
>
> public:
> C(int *x): haveX(x != 0) {
> if (haveX)
> *x = 1; // null dereference??
> }
> };
>
> In this case, 'haveX' should assume that x is non-null. The code
> eagerly splits states into {x == 0} and {x != 0}, which seems
> reasonable. However, after 'haveX' have been initialized, the default
> constructor for field 'd' kicks in. If only this constructor is not
> inlined, or any of its callees are not inlined, value stored in
> 'haveX' would be invalidated on both paths. In particular, the path on
> which the original 'haveX' is false but the invalidated 'haveX' is
> true have now opened up.
>
> ---
>
> Inlining of the constructor itself is disabled in many cases for many
> reasons. In particular, we are currently only trying to inline the
> constructor when the target this-region is a DeclRegion (variable,
> member variable, Objective-C++ instance var), and the destructor is
> non-trivial. This cuts away a lot of cases:
>
> * Constructing into temporaries is disabled when destructor is
> non-trivial. At least, we should be able to inline those when the
> destructor is present at all, so that it would be evaluated
> conservatively. One thing to note here is that our CFG has been
> recently fixed, so we're much closer to fixing this properly nowadays.
> However, CFG alone is not enough to figure out which destructor needs
> to be called; for instance, if a lifetime-extended temporary is
> initialized with an operator?: expression, we'd need path-sensitive
> information to figure out which object to destroy.
>
> * Temporaries are also special because our pass-by-value is not always
> working correctly. In particular, when we inline 'foo(c)', where
> variable 'c' is of 'class C', we first copy 'c' into a temporary
> region, and then trivial-copy it into the stack variable-region of the
> function parameter, while we should be constructing directly into the
> parameter region. We cannot construct directly into the parameter
> region because the stack frame has not yet been constructed, because
> arguments are not yet computed. More reading on the subject, even if a
> bit outdated, is at
> http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
> This is a hard problem, but i wasn't noticing many instances of it yet.
>
> * Base object region constructors are disabled when destructor is
> non-trivial. This sounds like an accidental omission.
>
> * Construction into ElementRegion is disabled regardless of
> destructors. In particular, mass array constructions are disabled.
> There is a special AST for this case, which emulates the loop through
> the array (or return value of operator new[]) with a loop counter
> variable and all. We have no support for this whole language
> construct. Note, however, that ElementRegion is much more than array
> element; it is often used for representing casts, and in particular it
> appears in return values of a conservatively evaluated operator new()
> (i.e. element{SymRegion}) and is likely to appear in placement-new()
> return values for the same reason. So we should discriminate between
> these two cases.
>
> * Constructing into return value of operator new() is disabled
> completely anyway, because there's a modeling problem that causes us
> to be unable to connect the constructor with the correct this-region.
> The CFG part of this problem was fixed by Jordan in 2014 by adding the
> CFGNewAllocator element, so we now actually call the operator and the
> constructor in the correct order, but we still need to pass the
> operator new's return value to the constructor. Note how pointless it
> is to enable it, or even inline a custom operator new, until
> construction into ElementRegion is fixed.
One more syntax pattern that we don't support was unveiled in
https://reviews.llvm.org/D40841 : construction as part of a bigger
aggregate initialization would be never inlined, neither it is performed
with a correct this-region. This includes cases like
struct A { A(); }; // not an aggregate, has constructor.
struct B { A a; }; // aggregate, no constructor needed.
B b = {}; // A() is not handled correctly here.
Or, since C++17, also
struct B: public A {};
works similarly.
> ---
>
> Speaking of inlining operator new():
>
> * For correct modeling in the core, it seems that the only thing that
> remains is to actually use the return value of operator new() and
> construct the new object *into* it. Sounds easy.
>
> * I’m also immediately noticing the unimplemented path diagnostics
> within the inlined operator new(), which crash with
> llvm_unreachable(“not yet implemented”) whenever a visitor fires
> within it (eg. to indicate that memory allocated there leaks).
>
> * MallocChecker keeps being surprised with various non-symbolic stuff
> that may come out from operator new(). https://reviews.llvm.org/D37189
> doesn’t cover all cases; in particular, void *operator new(size_t) {
> static int x; return &x; } seems to instantly crash after inlining.
>
> ---
>
> From now on i'd like to experiment with, first of all, disabling the
> DeclRegion bailout when possible. Then i'd try to inline operator new,
> pass its return value to the respective constructor, and inline said
> constructor. I'm not sure if i'd be able to dig into the temporaries
> and pass-by-copy problems soon.
>
> And i believe that when it comes to C++ pure language constructs, i
> listed most of the problems i'm aware of, with the exception of, well,
> *exceptions* (which also need CFG work). Of course we could do better
> modeling the standard library as well.
More information about the cfe-dev
mailing list