[cfe-dev] [analyzer] C++ and inlining.

Sun Nov 26 04:17:05 PST 2017

Hi Artem!

On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> I've observed a lot of false positives on WebKit that resulted from our
> inability to inline relatively simple functions in C++, specifically
> constructors and operator new(), due to unimplemented modeling. So i wanted
> to document, and eventually fix, a variety of language feature modeling
> we're currently lacking. Note that WebKit, like LLVM, is not a typical C++
> codebase; it uses many language constructs that regular projects don't use,
> and uses very little standard library headers.
>

Great news!

>
> ---
>
> Normally we expect the analyzer to work even without any inlining, by
> conservatively invalidating the program state, which eliminates the clearly
> false assumptions but sometimes causes infeasible paths when the
> invalidation was too aggressive and we start denoting the same value with a
> different symbol after invalidation and assume contradictory stuff on the
> old and the new symbol. The false positives resulting from aggressive
> invalidation are usually treated as less scary than the ones suppressed by
> invalidation, because they can be easily suppressed (with assertions or
> const qualifiers etc.) without loss of coverage. However, in C++ there are
> a lot of implicit function calls which cause massive frustration when
> evaluated conservatively. For example,
>
>   class C {
>     bool haveX;
>     class D d;
>
>   public:
>     C(int *x): haveX(x != 0) {
>       if (haveX)
>         *x = 1; // null dereference??
>     }
>   };
>
> In this case, 'haveX' should assume that x is non-null. The code eagerly
> splits states into {x == 0} and {x != 0}, which seems reasonable. However,
> after 'haveX' have been initialized, the default constructor for field 'd'
> kicks in. If only this constructor is not inlined, or any of its callees
> are not inlined, value stored in 'haveX' would be invalidated on both
> paths. In particular, the path on which the original 'haveX' is false but
> the invalidated 'haveX' is true have now opened up.
>

I observed the very same pattern on Ericsson internal codebases, so sorting
this out is definitely a huge win for many projects (not just LLVM and
WebKit).

>
> ---
>
> Inlining of the constructor itself is disabled in many cases for many
> reasons. In particular, we are currently only trying to inline the
> constructor when the target this-region is a DeclRegion (variable, member
> variable, Objective-C++ instance var), and the destructor is non-trivial.
> This cuts away a lot of cases:
>

It is not clear for my why the non-trivial destructor is a requirement
here. Do yo have any info on that?

>
> * Constructing into temporaries is disabled when destructor is
> non-trivial. At least, we should be able to inline those when the
> destructor is present at all, so that it would be evaluated conservatively.
> One thing to note here is that our CFG has been recently fixed, so we're
> much closer to fixing this properly nowadays. However, CFG alone is not
> enough to figure out which destructor needs to be called; for instance, if
> a lifetime-extended temporary is initialized with an operator?: expression,
> we'd need path-sensitive information to figure out which object to destroy.
>
> * Temporaries are also special because our pass-by-value is not always
> working correctly. In particular, when we inline 'foo(c)', where variable
> 'c' is of 'class C', we first copy 'c' into a temporary region, and then
> trivial-copy it into the stack variable-region of the function parameter,
> while we should be constructing directly into the parameter region. We
> cannot construct directly into the parameter region because the stack frame
> has not yet been constructed, because arguments are not yet computed. More
> reading on the subject, even if a bit outdated, is at
> http://clang-developers.42468.n3.nabble.com/Inlining-tempora
> ry-destructors-in-the-analyzer-td4040851.html This is a hard problem, but
> i wasn't noticing many instances of it yet.
>

I am wondering what the analyzer does with copy elision? Do we model that
somehow?

>
> * Base object region constructors are disabled when destructor is
> non-trivial. This sounds like an accidental omission.
>

Nice catch.

>
> * Construction into ElementRegion is disabled regardless of destructors.
> In particular, mass array constructions are disabled. There is a special
> AST for this case, which emulates the loop through the array (or return
> value of operator new[]) with a loop counter variable and all. We have no
> support for this whole language construct. Note, however, that
> ElementRegion is much more than array element; it is often used for
> representing casts, and in particular it appears in return values of a
> conservatively evaluated operator new() (i.e. element{SymRegion}) and is
> likely to appear in placement-new() return values for the same reason. So
> we should discriminate between these two cases.
>

Do you know why casts are represented this way? Is this something that we
might want to change in the future? I think one of the reasons why
construction into ElementRegion is disabled because arrays are populated
lazily by the analyzer. And this does not work well with code that relies
on the fact that the number of constructor/destructor invocations is the
same as the number of elements in the array.

>
> * Constructing into return value of operator new() is disabled completely
> anyway, because there's a modeling problem that causes us to be unable to
> connect the constructor with the correct this-region. The CFG part of this
> problem was fixed by Jordan in 2014 by adding the CFGNewAllocator element,
> so we now actually call the operator and the constructor in the correct
> order, but we still need to pass the operator new's return value to the
> constructor. Note how pointless it is to enable it, or even inline a custom
> operator new, until construction into ElementRegion is fixed.
>
> ---
>
> Speaking of inlining operator new():
>
> * For correct modeling in the core, it seems that the only thing that
> remains is to actually use the return value of operator new() and construct
> the new object *into* it. Sounds easy.
>
> * I’m also immediately noticing the unimplemented path diagnostics within
> the inlined operator new(), which crash with llvm_unreachable(“not yet
> implemented”) whenever a visitor fires within it (eg. to indicate that
> memory allocated there leaks).
>
> * MallocChecker keeps being surprised with various non-symbolic stuff that
> may come out from operator new(). https://reviews.llvm.org/D37189 doesn’t
> cover all cases; in particular, void *operator new(size_t) { static int x;
> return &x; } seems to instantly crash after inlining.
>
> ---
>
> From now on i'd like to experiment with, first of all, disabling the
> DeclRegion bailout when possible. Then i'd try to inline operator new, pass
> its return value to the respective constructor, and inline said
> constructor. I'm not sure if i'd be able to dig into the temporaries and
> pass-by-copy problems soon.
>
> And i believe that when it comes to C++ pure language constructs, i listed
> most of the problems i'm aware of, with the exception of, well,
> *exceptions* (which also need CFG work). Of course we could do better
> modeling the standard library as well.
>

Thank you for collecting these problems!

> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20171126/cb42bf06/attachment.html>