[cfe-dev] [analyzer] C++ and inlining.
Alexey Salmin via cfe-dev
cfe-dev at lists.llvm.org
Sun Dec 3 12:56:58 PST 2017
Speaking of the code sample from the first e-mail: does anyone has an
explanation why it is the non-inlined constructor being a special
case? If you replace with an arbitrary non-inlined function call
between the haveX initialization and the "if" condition the analysis
seem to work fine. Compare assembly in [1] with the "D d;" line
commented out and active.
[1] https://godbolt.org/g/jmSKT9
Alexey
On Tue, Nov 28, 2017 at 6:19 PM, Artem Dergachev via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
>
>
> On 11/26/17 3:17 PM, Gábor Horváth wrote:
>>
>> Hi Artem!
>>
>>
>> On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev
>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>
>> I've observed a lot of false positives on WebKit that resulted
>> from our inability to inline relatively simple functions in C++,
>> specifically constructors and operator new(), due to unimplemented
>> modeling. So i wanted to document, and eventually fix, a variety
>> of language feature modeling we're currently lacking. Note that
>> WebKit, like LLVM, is not a typical C++ codebase; it uses many
>> language constructs that regular projects don't use, and uses very
>> little standard library headers.
>>
>>
>> Great news!
>>
>>
>> ---
>>
>> Normally we expect the analyzer to work even without any inlining,
>> by conservatively invalidating the program state, which eliminates
>> the clearly false assumptions but sometimes causes infeasible
>> paths when the invalidation was too aggressive and we start
>> denoting the same value with a different symbol after invalidation
>> and assume contradictory stuff on the old and the new symbol. The
>> false positives resulting from aggressive invalidation are usually
>> treated as less scary than the ones suppressed by invalidation,
>> because they can be easily suppressed (with assertions or const
>> qualifiers etc.) without loss of coverage. However, in C++ there
>> are a lot of implicit function calls which cause massive
>> frustration when evaluated conservatively. For example,
>>
>> class C {
>> bool haveX;
>> class D d;
>>
>> public:
>> C(int *x): haveX(x != 0) {
>> if (haveX)
>> *x = 1; // null dereference??
>> }
>> };
>>
>> In this case, 'haveX' should assume that x is non-null. The code
>> eagerly splits states into {x == 0} and {x != 0}, which seems
>> reasonable. However, after 'haveX' have been initialized, the
>> default constructor for field 'd' kicks in. If only this
>> constructor is not inlined, or any of its callees are not inlined,
>> value stored in 'haveX' would be invalidated on both paths. In
>> particular, the path on which the original 'haveX' is false but
>> the invalidated 'haveX' is true have now opened up.
>>
>>
>> I observed the very same pattern on Ericsson internal codebases, so
>> sorting this out is definitely a huge win for many projects (not just LLVM
>> and WebKit).
>>
>>
>> ---
>>
>> Inlining of the constructor itself is disabled in many cases for
>> many reasons. In particular, we are currently only trying to
>> inline the constructor when the target this-region is a DeclRegion
>> (variable, member variable, Objective-C++ instance var), and the
>> destructor is non-trivial. This cuts away a lot of cases:
>>
>>
>> It is not clear for my why the non-trivial destructor is a requirement
>> here. Do yo have any info on that?
>>
>>
>> * Constructing into temporaries is disabled when destructor is
>> non-trivial. At least, we should be able to inline those when the
>> destructor is present at all, so that it would be evaluated
>> conservatively. One thing to note here is that our CFG has been
>> recently fixed, so we're much closer to fixing this properly
>> nowadays. However, CFG alone is not enough to figure out which
>> destructor needs to be called; for instance, if a
>> lifetime-extended temporary is initialized with an operator?:
>> expression, we'd need path-sensitive information to figure out
>> which object to destroy.
>>
>> * Temporaries are also special because our pass-by-value is not
>> always working correctly. In particular, when we inline 'foo(c)',
>> where variable 'c' is of 'class C', we first copy 'c' into a
>> temporary region, and then trivial-copy it into the stack
>> variable-region of the function parameter, while we should be
>> constructing directly into the parameter region. We cannot
>> construct directly into the parameter region because the stack
>> frame has not yet been constructed, because arguments are not yet
>> computed. More reading on the subject, even if a bit outdated, is
>> at
>>
>> http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
>>
>> <http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html>
>> This is a hard problem, but i wasn't noticing many instances of it
>> yet.
>>
>>
>> I am wondering what the analyzer does with copy elision? Do we model that
>> somehow?
>>
>>
>> * Base object region constructors are disabled when destructor is
>> non-trivial. This sounds like an accidental omission.
>>
>>
>> Nice catch.
>
>
> Actually not so much. This bailout only fires for CK_Complete constructor
> kinds, and base-object constructors aren't "complete" in this sense, so they
> work fine.
>
> This might still affect a complete constructor that follows placement-new
> into a base-object region (i'd be shocked to see code that actually does
> it).
>
>>
>> * Construction into ElementRegion is disabled regardless of
>> destructors. In particular, mass array constructions are disabled.
>> There is a special AST for this case, which emulates the loop
>> through the array (or return value of operator new[]) with a loop
>> counter variable and all. We have no support for this whole
>> language construct. Note, however, that ElementRegion is much more
>> than array element; it is often used for representing casts, and
>> in particular it appears in return values of a conservatively
>> evaluated operator new() (i.e. element{SymRegion}) and is likely
>> to appear in placement-new() return values for the same reason. So
>> we should discriminate between these two cases.
>>
>>
>> Do you know why casts are represented this way? Is this something that we
>> might want to change in the future? I think one of the reasons why
>> construction into ElementRegion is disabled because arrays are populated
>> lazily by the analyzer. And this does not work well with code that relies on
>> the fact that the number of constructor/destructor invocations is the same
>> as the number of elements in the array.
>>
>>
>> * Constructing into return value of operator new() is disabled
>> completely anyway, because there's a modeling problem that causes
>> us to be unable to connect the constructor with the correct
>> this-region. The CFG part of this problem was fixed by Jordan in
>> 2014 by adding the CFGNewAllocator element, so we now actually
>> call the operator and the constructor in the correct order, but we
>> still need to pass the operator new's return value to the
>> constructor. Note how pointless it is to enable it, or even inline
>> a custom operator new, until construction into ElementRegion is fixed.
>>
>> ---
>>
>> Speaking of inlining operator new():
>>
>> * For correct modeling in the core, it seems that the only thing
>> that remains is to actually use the return value of operator new()
>> and construct the new object *into* it. Sounds easy.
>>
>> * I’m also immediately noticing the unimplemented path diagnostics
>> within the inlined operator new(), which crash with
>> llvm_unreachable(“not yet implemented”) whenever a visitor fires
>> within it (eg. to indicate that memory allocated there leaks).
>>
>> * MallocChecker keeps being surprised with various non-symbolic
>> stuff that may come out from operator new().
>> https://reviews.llvm.org/D37189 <https://reviews.llvm.org/D37189>
>> doesn’t cover all cases; in particular, void *operator new(size_t)
>> { static int x; return &x; } seems to instantly crash after inlining.
>>
>> ---
>>
>> From now on i'd like to experiment with, first of all, disabling
>> the DeclRegion bailout when possible. Then i'd try to inline
>> operator new, pass its return value to the respective constructor,
>> and inline said constructor. I'm not sure if i'd be able to dig
>> into the temporaries and pass-by-copy problems soon.
>>
>> And i believe that when it comes to C++ pure language constructs,
>> i listed most of the problems i'm aware of, with the exception of,
>> well, *exceptions* (which also need CFG work). Of course we could
>> do better modeling the standard library as well.
>>
>>
>> Thank you for collecting these problems!
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list