[cfe-dev] Inlining temporary destructors in the analyzer

Tue Aug 5 07:10:32 PDT 2014

Ping.

On Fri, Aug 1, 2014 at 1:35 PM, Manuel Klimek <klimek at google.com> wrote:

> On Fri, Aug 1, 2014 at 1:02 AM, Jordan Rose <jordan_rose at apple.com> wrote:
>
>>
>> On Jul 31, 2014, at 12:54 , Manuel Klimek <klimek at google.com> wrote:
>>
>> On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <jordan_rose at apple.com>
>> wrote:
>>
>>>
>>> On Jul 31, 2014, at 12:46 , Manuel Klimek <klimek at google.com> wrote:
>>>
>>> On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <jordan_rose at apple.com>
>>> wrote:
>>>
>>>>
>>>> On Jul 31, 2014, at 1:20 , Manuel Klimek <klimek at google.com> wrote:
>>>>
>>>> On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <jordan_rose at apple.com>
>>>> wrote:
>>>>
>>>>>
>>>>> On Jul 30, 2014, at 8:14 , Manuel Klimek <klimek at google.com> wrote:
>>>>>
>>>>> Hi Jordan,
>>>>>
>>>>> I made some progress on inlining temporary destructors. My main
>>>>> problem currently is that somehow the initializer values in my constructor
>>>>> get lost if I don't also reference 'this' via some other way.
>>>>>
>>>>> An examples shows that best. Consider:
>>>>> struct A {
>>>>>   A(int* y) : z(y) {}
>>>>>   ~A() { *z = 23; }
>>>>>   int *z;
>>>>> };
>>>>> void f() {
>>>>>   int x = 0;
>>>>>   {(A(&x));}
>>>>> }
>>>>>
>>>>> This leads to a "Dereference of undefined pointer value" warning on *z
>>>>> = 23.
>>>>> Funnily enough this warning goes away (and all values get correctly
>>>>> propagated) if change the constructor to be:
>>>>> A(int* y) : z(y) { (void)this; }
>>>>>
>>>>> Obviously this also works fine if I use a local variable instead of a
>>>>> temporary.
>>>>>
>>>>> Any ideas what might be going wrong here would be greatly appreciated.
>>>>>
>>>>>
>>>>> My guess is that the temporary is an rvalue, and in the analyzer a
>>>>> NonLoc, and that the only reason this even appears to work is because we
>>>>> needed to be able to call *methods* on temporaries and so will create
>>>>> a temporary region for them on demand to serve as the 'this' pointer.
>>>>> However, I don't think we expect that to happen for *constructors,* and
>>>>> so we don't keep the region around after the constructor has finished
>>>>> executing. (That is, when it's just a method being called—and neither the
>>>>> constructor nor destructor are inlined—it doesn't actually matter what you
>>>>> do to the contents of the region, because nothing else can access it.)
>>>>>
>>>>>
>>>> So, funnily enough if I remove this code:
>>>>   if (isTemporaryPRValue(CCE, ThisV))
>>>>   ThisV = state->getSVal(ThisV.castAs<Loc>());
>>>> which puts the temp region into a lazycompoundregion, the memory seems
>>>> to be correctly handled. I have not yet looked what else this breaks though.
>>>>
>>>>
>>>> Okay, analyzer design time. :-)
>>>>
>>>> C++ has a distinction between lvalues and rvalues (actually glvalues
>>>> and prvalues). There are a number of things that go into this distinction,
>>>> but the part the analyzer cares about is that glvalues refer to true
>>>> objects and prvalues do not. (In C++ parlance, "object" refers to anything
>>>> with storage, a type, and lifetime, not just class instances. In the
>>>> analyzer, we call such things "regions".)
>>>>
>>>> If you were to build a very literal C++ compiler, every glvalue would
>>>> be represented by an address. PRValues*, on the other hand, don't really
>>>> need addresses; they only exist to be used transiently.
>>>>
>>>
>>> I'm not sure how "exists transiently" equals "doesn't need an address".
>>> At least in C++, this seems incorrect (?) for objects.
>>>
>>>>
>>> IMHO it's a problem with the standard, and this extra invention of
>>> "modifiable" seems weird to me.
>>>
>>>
>>> ** How do you capitalize "prvalue"?*
>>>>
>>>> The analyzer pretty much always believes the compiler when it comes to
>>>> glvalues vs. prvalues. That means that the value of a glvalue expression is
>>>> *always* a Loc. A prvalue expression, on the other hand, will only be
>>>> a Loc if it's a pointer type*. If you're talking about a struct prvalue,
>>>> what you're really talking about is the bindings in its memory region—it
>>>> has no identity. This is represented by a CompoundVal or LazyCompoundVal.
>>>> This model worked great for C, but it's starting to show strain in C++.
>>>>
>>>> ** or one of the other types mentioned in Loc::isLocType. But not all
>>>> of those can be expression values.*
>>>>
>>>> Besides "regular" temporaries, this is also a problem for prvalue
>>>> function arguments. When the analyzer inlines a function, it just blithely
>>>> binds the arguments to the proper VarRegions created for each ParmVarDecl
>>>> in the current stack frame. That doesn't really fly if we're supposed to be
>>>> calling the copy constructor. Worse, the copy constructor call happens
>>>> *outside* of the function call, which means it happens before we've
>>>> even decided whether or not to inline, which means there are no VarRegions
>>>> yet to construct into. We've/I've been putting off dealing with this one
>>>> for a while; we get around it mostly because we *don't inline
>>>> constructors for temporaries.*
>>>>
>>>> The general temporary problem goes to what Richard was saying: a
>>>> temporary expression is an rvalue, but if we have to run a constructor then
>>>> clearly there *was* a region at some point. The C++11 standard (at
>>>> least the version I have) is not wonderful about classifying temporaries
>>>> ([basic.lval]p1):
>>>>
>>>> - An *lvalue*...designates a function or an object.
>>>>
>>>> - An *xvalue* (an “eXpiring” value) also refers to an object, usually
>>>> near the end of its lifetime.
>>>>
>>>> - A *glvalue* (“generalized” lvalue) is an lvalue or an xvalue.
>>>>
>>>> - An *rvalue...*is an xvalue, a temporary object (12.2) or subobject
>>>> thereof, or a value that is not associated with an object.
>>>>
>>>> - A *prvalue* (“pure” rvalue) is an rvalue that is not an xvalue.
>>>>
>>>>
>>>> So temporary objects are prvalues. Even though they are modifiable
>>>> (defined later in the same section). Even though with construction and
>>>> destruction they end up needing a fixed region, or at least consistency in
>>>> region contents.
>>>>
>>>> I don't have a great answer here. My guess is we'll probably end up
>>>> turning on *some* inlining for temporaries to start, but not all.
>>>>
>>>> Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the
>>>> few cases where the AST wants a prvalue but the analyzer really needs a
>>>> valid Loc, and conjuring a region on the fly. That's important for things
>>>> that need to have referential identity, like the 'this' pointer in a method
>>>> called on a temporary, or (say) a temporary that's been lifetime-extended.
>>>>
>>>
>>> Since this is right at the end of the constructor call, this always
>>> creates and binds the value if we hit the constructor. See my patch - it
>>> seems to work fine :) What am I missing?
>>>
>>>
>>> That breaks the invariant that a non-Loc-type prvalue is not a Loc. If
>>> you tried to assign this directly into another struct, you'd effectively be
>>> assigning an address instead of a set of bindings. IIRC there's some code
>>> in the analyzer that tries to handle this case anyway, so it may not break
>>> down immediately, but I'd rather not lose that assumption.
>>>
>>
>> I'm confused - isn't that already the case currently? Have you looked at
>> http://reviews.llvm.org/D4740? It's very small, and I don't see where I
>> regress an assumption that's not already violated (if I remove the
>> isTemporaryPRValue section loads of tests break, so I assume changing the
>> layout of a temporary section will make significant changes to the analyzer
>> necessary).
>>
>>
>> Sorry, I meant removing the isTemporaryPRValue thing would break
>> invariants. Enabling inlining doesn't break any existing invariants, but it
>> makes things worse than they currently are. This is the test case I have in
>> mind:
>>
>> struct Test {
>>   bool flag;
>>   ~Test {
>>     if (flag)
>>       clang_analyzer_checkReached(true);
>>   }
>> }
>>
>> void use(Test arg) {
>>  arg.flag = true;
>> }
>>
>> void topLevel() {
>> Test t;
>> use(t);
>> }
>>
>> The call 'test(t)' involves a copy constructor, which the analyzer treats
>> as a temporary construction. Maybe that's the problem—this is really a
>> different sort of construction. In any case, though, who's responsible for
>> destroying 'arg'? I think it has to be the caller, because the callee may
>> not be inlined...but maybe I'm wrong about that. Still, the value of 'this'
>> won't match between construction at the call site and use within the 'use'
>> method.
>>
>
> That makes sense, but it would seem to me like that needs to be fixed when
> passing a parameter by value - that seems to be incorrectly modeled then.
>
> I think it'd be fine to model this as a copy constructor (Richard might
> know more), but I think in that case we'd need to call the destructors for
> the temporary outside *and* the destructor for the parameter inside. That
> is, instead of having 2 destructor calls (one for 't' and one for the
> temporary), we'd have three: One for 't', one for the temporary (both with
> flag == false) and one for the parameter, executed inside the function
> (with flag == true).
> On the other hand the more correct thing to do is probably to somehow use
> the memory region of the temporary we hand in for 'arg'. I have no idea
> where in the static analyzer this is handled though - I dug through
> Store.cpp and RegionStore.cpp but couldn't figure out where the copy of the
> regions really happens...
>
>
>> Does this break assumptions about any other temporaries? I'm not sure. I
>> just don't want to end up in a situation where we're inlining constructors
>> and destructors, but they don't have a consistent view of the object.
>>
>> Jordan
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140805/7bbebf27/attachment.html>