[cfe-dev] [analyzer] C++ and inlining.
Alexey Salmin via cfe-dev
cfe-dev at lists.llvm.org
Sun Dec 3 20:52:29 PST 2017
Thanks!
I had the "this" pointer leaking to D::D in mind but thought that the
offsetof trick would involve UB at some point. It seems that as long
there's is an actual boolean where D::D looks for it this is probably
fine, even though I'm having a hard time with proving it from the
standard wording.
Alexey
On Mon, Dec 4, 2017 at 1:16 AM, Artem Dergachev <noqnoqneo at gmail.com> wrote:
> In this particular example this happens because we may potentially have the
> constructor of D defined as
>
> D::D() {
> bool *haveXptr = ((void *)this) - (offsetof(C, d) - offsetof(C, x));
> *haveXptr = !*haveXptr;
> }
>
> which is valid code that changes haveX between initialization and
> assignment.
>
> However, an external function is unlikely to contain the 'this' pointer
> (pointer to variable 'c'). At least in your code generation example, where
> 'c' is a local variable and the pointer to it never escapes to the global
> scope.
>
> Generally, in the analyzer when we don't know if something can happen, we
> often prefer to assume the worst, much like in codegen. In this particular
> example i'd be for toning down the invalidation (as in
> http://lists.llvm.org/pipermail/cfe-dev/2017-November/056103.html), but
> there may be more reasons for invalidating stuff, some of which sound much
> more convincing.
>
>
> On 12/3/17 12:56 PM, Alexey Salmin wrote:
>>
>> Speaking of the code sample from the first e-mail: does anyone has an
>> explanation why it is the non-inlined constructor being a special
>> case? If you replace with an arbitrary non-inlined function call
>> between the haveX initialization and the "if" condition the analysis
>> seem to work fine. Compare assembly in [1] with the "D d;" line
>> commented out and active.
>>
>> [1] https://godbolt.org/g/jmSKT9
>>
>> Alexey
>>
>> On Tue, Nov 28, 2017 at 6:19 PM, Artem Dergachev via cfe-dev
>> <cfe-dev at lists.llvm.org> wrote:
>>>
>>>
>>> On 11/26/17 3:17 PM, Gábor Horváth wrote:
>>>>
>>>> Hi Artem!
>>>>
>>>>
>>>> On 26 November 2017 at 12:10, Artem Dergachev via cfe-dev
>>>> <cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>> wrote:
>>>>
>>>> I've observed a lot of false positives on WebKit that resulted
>>>> from our inability to inline relatively simple functions in C++,
>>>> specifically constructors and operator new(), due to unimplemented
>>>> modeling. So i wanted to document, and eventually fix, a variety
>>>> of language feature modeling we're currently lacking. Note that
>>>> WebKit, like LLVM, is not a typical C++ codebase; it uses many
>>>> language constructs that regular projects don't use, and uses very
>>>> little standard library headers.
>>>>
>>>>
>>>> Great news!
>>>>
>>>>
>>>> ---
>>>>
>>>> Normally we expect the analyzer to work even without any inlining,
>>>> by conservatively invalidating the program state, which eliminates
>>>> the clearly false assumptions but sometimes causes infeasible
>>>> paths when the invalidation was too aggressive and we start
>>>> denoting the same value with a different symbol after invalidation
>>>> and assume contradictory stuff on the old and the new symbol. The
>>>> false positives resulting from aggressive invalidation are usually
>>>> treated as less scary than the ones suppressed by invalidation,
>>>> because they can be easily suppressed (with assertions or const
>>>> qualifiers etc.) without loss of coverage. However, in C++ there
>>>> are a lot of implicit function calls which cause massive
>>>> frustration when evaluated conservatively. For example,
>>>>
>>>> class C {
>>>> bool haveX;
>>>> class D d;
>>>>
>>>> public:
>>>> C(int *x): haveX(x != 0) {
>>>> if (haveX)
>>>> *x = 1; // null dereference??
>>>> }
>>>> };
>>>>
>>>> In this case, 'haveX' should assume that x is non-null. The code
>>>> eagerly splits states into {x == 0} and {x != 0}, which seems
>>>> reasonable. However, after 'haveX' have been initialized, the
>>>> default constructor for field 'd' kicks in. If only this
>>>> constructor is not inlined, or any of its callees are not inlined,
>>>> value stored in 'haveX' would be invalidated on both paths. In
>>>> particular, the path on which the original 'haveX' is false but
>>>> the invalidated 'haveX' is true have now opened up.
>>>>
>>>>
>>>> I observed the very same pattern on Ericsson internal codebases, so
>>>> sorting this out is definitely a huge win for many projects (not just
>>>> LLVM
>>>> and WebKit).
>>>>
>>>>
>>>> ---
>>>>
>>>> Inlining of the constructor itself is disabled in many cases for
>>>> many reasons. In particular, we are currently only trying to
>>>> inline the constructor when the target this-region is a DeclRegion
>>>> (variable, member variable, Objective-C++ instance var), and the
>>>> destructor is non-trivial. This cuts away a lot of cases:
>>>>
>>>>
>>>> It is not clear for my why the non-trivial destructor is a requirement
>>>> here. Do yo have any info on that?
>>>>
>>>>
>>>> * Constructing into temporaries is disabled when destructor is
>>>> non-trivial. At least, we should be able to inline those when the
>>>> destructor is present at all, so that it would be evaluated
>>>> conservatively. One thing to note here is that our CFG has been
>>>> recently fixed, so we're much closer to fixing this properly
>>>> nowadays. However, CFG alone is not enough to figure out which
>>>> destructor needs to be called; for instance, if a
>>>> lifetime-extended temporary is initialized with an operator?:
>>>> expression, we'd need path-sensitive information to figure out
>>>> which object to destroy.
>>>>
>>>> * Temporaries are also special because our pass-by-value is not
>>>> always working correctly. In particular, when we inline 'foo(c)',
>>>> where variable 'c' is of 'class C', we first copy 'c' into a
>>>> temporary region, and then trivial-copy it into the stack
>>>> variable-region of the function parameter, while we should be
>>>> constructing directly into the parameter region. We cannot
>>>> construct directly into the parameter region because the stack
>>>> frame has not yet been constructed, because arguments are not yet
>>>> computed. More reading on the subject, even if a bit outdated, is
>>>> at
>>>>
>>>>
>>>> http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html
>>>>
>>>>
>>>> <http://clang-developers.42468.n3.nabble.com/Inlining-temporary-destructors-in-the-analyzer-td4040851.html>
>>>> This is a hard problem, but i wasn't noticing many instances of it
>>>> yet.
>>>>
>>>>
>>>> I am wondering what the analyzer does with copy elision? Do we model
>>>> that
>>>> somehow?
>>>>
>>>>
>>>> * Base object region constructors are disabled when destructor is
>>>> non-trivial. This sounds like an accidental omission.
>>>>
>>>>
>>>> Nice catch.
>>>
>>>
>>> Actually not so much. This bailout only fires for CK_Complete constructor
>>> kinds, and base-object constructors aren't "complete" in this sense, so
>>> they
>>> work fine.
>>>
>>> This might still affect a complete constructor that follows placement-new
>>> into a base-object region (i'd be shocked to see code that actually does
>>> it).
>>>
>>>> * Construction into ElementRegion is disabled regardless of
>>>> destructors. In particular, mass array constructions are disabled.
>>>> There is a special AST for this case, which emulates the loop
>>>> through the array (or return value of operator new[]) with a loop
>>>> counter variable and all. We have no support for this whole
>>>> language construct. Note, however, that ElementRegion is much more
>>>> than array element; it is often used for representing casts, and
>>>> in particular it appears in return values of a conservatively
>>>> evaluated operator new() (i.e. element{SymRegion}) and is likely
>>>> to appear in placement-new() return values for the same reason. So
>>>> we should discriminate between these two cases.
>>>>
>>>>
>>>> Do you know why casts are represented this way? Is this something that
>>>> we
>>>> might want to change in the future? I think one of the reasons why
>>>> construction into ElementRegion is disabled because arrays are populated
>>>> lazily by the analyzer. And this does not work well with code that
>>>> relies on
>>>> the fact that the number of constructor/destructor invocations is the
>>>> same
>>>> as the number of elements in the array.
>>>>
>>>>
>>>> * Constructing into return value of operator new() is disabled
>>>> completely anyway, because there's a modeling problem that causes
>>>> us to be unable to connect the constructor with the correct
>>>> this-region. The CFG part of this problem was fixed by Jordan in
>>>> 2014 by adding the CFGNewAllocator element, so we now actually
>>>> call the operator and the constructor in the correct order, but we
>>>> still need to pass the operator new's return value to the
>>>> constructor. Note how pointless it is to enable it, or even inline
>>>> a custom operator new, until construction into ElementRegion is
>>>> fixed.
>>>>
>>>> ---
>>>>
>>>> Speaking of inlining operator new():
>>>>
>>>> * For correct modeling in the core, it seems that the only thing
>>>> that remains is to actually use the return value of operator new()
>>>> and construct the new object *into* it. Sounds easy.
>>>>
>>>> * I’m also immediately noticing the unimplemented path diagnostics
>>>> within the inlined operator new(), which crash with
>>>> llvm_unreachable(“not yet implemented”) whenever a visitor fires
>>>> within it (eg. to indicate that memory allocated there leaks).
>>>>
>>>> * MallocChecker keeps being surprised with various non-symbolic
>>>> stuff that may come out from operator new().
>>>> https://reviews.llvm.org/D37189 <https://reviews.llvm.org/D37189>
>>>> doesn’t cover all cases; in particular, void *operator new(size_t)
>>>> { static int x; return &x; } seems to instantly crash after
>>>> inlining.
>>>>
>>>> ---
>>>>
>>>> From now on i'd like to experiment with, first of all, disabling
>>>> the DeclRegion bailout when possible. Then i'd try to inline
>>>> operator new, pass its return value to the respective constructor,
>>>> and inline said constructor. I'm not sure if i'd be able to dig
>>>> into the temporaries and pass-by-copy problems soon.
>>>>
>>>> And i believe that when it comes to C++ pure language constructs,
>>>> i listed most of the problems i'm aware of, with the exception of,
>>>> well, *exceptions* (which also need CFG work). Of course we could
>>>> do better modeling the standard library as well.
>>>>
>>>>
>>>> Thank you for collecting these problems!
>>>>
>>>> _______________________________________________
>>>> cfe-dev mailing list
>>>> cfe-dev at lists.llvm.org <mailto:cfe-dev at lists.llvm.org>
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>> <http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev>
>>>>
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
More information about the cfe-dev
mailing list