[cfe-dev] Inlining temporary destructors in the analyzer

Manuel Klimek klimek at google.com
Fri Aug 1 04:35:57 PDT 2014

On Fri, Aug 1, 2014 at 1:02 AM, Jordan Rose <jordan_rose at apple.com> wrote:

> On Jul 31, 2014, at 12:54 , Manuel Klimek <klimek at google.com> wrote:
> On Thu, Jul 31, 2014 at 9:51 PM, Jordan Rose <jordan_rose at apple.com>
> wrote:
>> On Jul 31, 2014, at 12:46 , Manuel Klimek <klimek at google.com> wrote:
>> On Thu, Jul 31, 2014 at 9:36 PM, Jordan Rose <jordan_rose at apple.com>
>> wrote:
>>> On Jul 31, 2014, at 1:20 , Manuel Klimek <klimek at google.com> wrote:
>>> On Thu, Jul 31, 2014 at 5:19 AM, Jordan Rose <jordan_rose at apple.com>
>>> wrote:
>>>> On Jul 30, 2014, at 8:14 , Manuel Klimek <klimek at google.com> wrote:
>>>> Hi Jordan,
>>>> I made some progress on inlining temporary destructors. My main problem
>>>> currently is that somehow the initializer values in my constructor get lost
>>>> if I don't also reference 'this' via some other way.
>>>> An examples shows that best. Consider:
>>>> struct A {
>>>>   A(int* y) : z(y) {}
>>>>   ~A() { *z = 23; }
>>>>   int *z;
>>>> };
>>>> void f() {
>>>>   int x = 0;
>>>>   {(A(&x));}
>>>> }
>>>> This leads to a "Dereference of undefined pointer value" warning on *z
>>>> = 23.
>>>> Funnily enough this warning goes away (and all values get correctly
>>>> propagated) if change the constructor to be:
>>>> A(int* y) : z(y) { (void)this; }
>>>> Obviously this also works fine if I use a local variable instead of a
>>>> temporary.
>>>> Any ideas what might be going wrong here would be greatly appreciated.
>>>> My guess is that the temporary is an rvalue, and in the analyzer a
>>>> NonLoc, and that the only reason this even appears to work is because we
>>>> needed to be able to call *methods* on temporaries and so will create
>>>> a temporary region for them on demand to serve as the 'this' pointer.
>>>> However, I don't think we expect that to happen for *constructors,* and
>>>> so we don't keep the region around after the constructor has finished
>>>> executing. (That is, when it's just a method being called—and neither the
>>>> constructor nor destructor are inlined—it doesn't actually matter what you
>>>> do to the contents of the region, because nothing else can access it.)
>>> So, funnily enough if I remove this code:
>>>   if (isTemporaryPRValue(CCE, ThisV))
>>>   ThisV = state->getSVal(ThisV.castAs<Loc>());
>>> which puts the temp region into a lazycompoundregion, the memory seems
>>> to be correctly handled. I have not yet looked what else this breaks though.
>>> Okay, analyzer design time. :-)
>>> C++ has a distinction between lvalues and rvalues (actually glvalues and
>>> prvalues). There are a number of things that go into this distinction, but
>>> the part the analyzer cares about is that glvalues refer to true objects
>>> and prvalues do not. (In C++ parlance, "object" refers to anything with
>>> storage, a type, and lifetime, not just class instances. In the analyzer,
>>> we call such things "regions".)
>>> If you were to build a very literal C++ compiler, every glvalue would be
>>> represented by an address. PRValues*, on the other hand, don't really need
>>> addresses; they only exist to be used transiently.
>> I'm not sure how "exists transiently" equals "doesn't need an address".
>> At least in C++, this seems incorrect (?) for objects.
>> IMHO it's a problem with the standard, and this extra invention of
>> "modifiable" seems weird to me.
>> ** How do you capitalize "prvalue"?*
>>> The analyzer pretty much always believes the compiler when it comes to
>>> glvalues vs. prvalues. That means that the value of a glvalue expression is
>>> *always* a Loc. A prvalue expression, on the other hand, will only be a
>>> Loc if it's a pointer type*. If you're talking about a struct prvalue, what
>>> you're really talking about is the bindings in its memory region—it has no
>>> identity. This is represented by a CompoundVal or LazyCompoundVal. This
>>> model worked great for C, but it's starting to show strain in C++.
>>> ** or one of the other types mentioned in Loc::isLocType. But not all of
>>> those can be expression values.*
>>> Besides "regular" temporaries, this is also a problem for prvalue
>>> function arguments. When the analyzer inlines a function, it just blithely
>>> binds the arguments to the proper VarRegions created for each ParmVarDecl
>>> in the current stack frame. That doesn't really fly if we're supposed to be
>>> calling the copy constructor. Worse, the copy constructor call happens
>>> *outside* of the function call, which means it happens before we've
>>> even decided whether or not to inline, which means there are no VarRegions
>>> yet to construct into. We've/I've been putting off dealing with this one
>>> for a while; we get around it mostly because we *don't inline
>>> constructors for temporaries.*
>>> The general temporary problem goes to what Richard was saying: a
>>> temporary expression is an rvalue, but if we have to run a constructor then
>>> clearly there *was* a region at some point. The C++11 standard (at
>>> least the version I have) is not wonderful about classifying temporaries
>>> ([basic.lval]p1):
>>> - An *lvalue*...designates a function or an object.
>>> - An *xvalue* (an “eXpiring” value) also refers to an object, usually
>>> near the end of its lifetime.
>>> - A *glvalue* (“generalized” lvalue) is an lvalue or an xvalue.
>>> - An *rvalue...*is an xvalue, a temporary object (12.2) or subobject
>>> thereof, or a value that is not associated with an object.
>>> - A *prvalue* (“pure” rvalue) is an rvalue that is not an xvalue.
>>> So temporary objects are prvalues. Even though they are modifiable
>>> (defined later in the same section). Even though with construction and
>>> destruction they end up needing a fixed region, or at least consistency in
>>> region contents.
>>> I don't have a great answer here. My guess is we'll probably end up
>>> turning on *some* inlining for temporaries to start, but not all.
>>> Oh, right, so what's that 'isTemporaryPRValue' doing? That's taking the
>>> few cases where the AST wants a prvalue but the analyzer really needs a
>>> valid Loc, and conjuring a region on the fly. That's important for things
>>> that need to have referential identity, like the 'this' pointer in a method
>>> called on a temporary, or (say) a temporary that's been lifetime-extended.
>> Since this is right at the end of the constructor call, this always
>> creates and binds the value if we hit the constructor. See my patch - it
>> seems to work fine :) What am I missing?
>> That breaks the invariant that a non-Loc-type prvalue is not a Loc. If
>> you tried to assign this directly into another struct, you'd effectively be
>> assigning an address instead of a set of bindings. IIRC there's some code
>> in the analyzer that tries to handle this case anyway, so it may not break
>> down immediately, but I'd rather not lose that assumption.
> I'm confused - isn't that already the case currently? Have you looked at
> http://reviews.llvm.org/D4740? It's very small, and I don't see where I
> regress an assumption that's not already violated (if I remove the
> isTemporaryPRValue section loads of tests break, so I assume changing the
> layout of a temporary section will make significant changes to the analyzer
> necessary).
> Sorry, I meant removing the isTemporaryPRValue thing would break
> invariants. Enabling inlining doesn't break any existing invariants, but it
> makes things worse than they currently are. This is the test case I have in
> mind:
> struct Test {
>   bool flag;
>   ~Test {
>     if (flag)
>       clang_analyzer_checkReached(true);
>   }
> }
> void use(Test arg) {
>  arg.flag = true;
> }
> void topLevel() {
> Test t;
> use(t);
> }
> The call 'test(t)' involves a copy constructor, which the analyzer treats
> as a temporary construction. Maybe that's the problem—this is really a
> different sort of construction. In any case, though, who's responsible for
> destroying 'arg'? I think it has to be the caller, because the callee may
> not be inlined...but maybe I'm wrong about that. Still, the value of 'this'
> won't match between construction at the call site and use within the 'use'
> method.

That makes sense, but it would seem to me like that needs to be fixed when
passing a parameter by value - that seems to be incorrectly modeled then.

I think it'd be fine to model this as a copy constructor (Richard might
know more), but I think in that case we'd need to call the destructors for
the temporary outside *and* the destructor for the parameter inside. That
is, instead of having 2 destructor calls (one for 't' and one for the
temporary), we'd have three: One for 't', one for the temporary (both with
flag == false) and one for the parameter, executed inside the function
(with flag == true).
On the other hand the more correct thing to do is probably to somehow use
the memory region of the temporary we hand in for 'arg'. I have no idea
where in the static analyzer this is handled though - I dug through
Store.cpp and RegionStore.cpp but couldn't figure out where the copy of the
regions really happens...

> Does this break assumptions about any other temporaries? I'm not sure. I
> just don't want to end up in a situation where we're inlining constructors
> and destructors, but they don't have a consistent view of the object.
> Jordan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20140801/9a5a051c/attachment.html>

More information about the cfe-dev mailing list