[cfe-dev] [analyzer] Temporaries.

Tue Feb 6 13:41:22 PST 2018

A bit of an update.

== Temporary destructors ==

Adding some initial support for temporary destructors seems pretty easy 
and straightforward at this point, given the awesome work done by Manuel 
Klimek on our CFG a few years ago.

1. We already have the awesome CFGTemporaryDtor elements, which have the 
backreference to the CXXBindTemporaryExpr for their temporaries.

2. In simple cases CXXBindTemporaryExprs have an immediate constructor 
within them, and therefore we can provide the CXXBindTemporaryExprs as 
the construction context's trigger statements, and therefore have a 
reliable CXXTempObjectRegion for constructors.

3. Then we already track the CXXBindTemporaryExprs for the active 
temporaries in the program state. We can attach any constructor 
information to them, such as the target region, if we need to (probably 
we can reconstruct it by looking at the expression and the location 
context, not sure if we want to).

4. So when we reach the CFGTemporaryDtor element, we can just lookup all 
the info we need, and perform the destruction properly.

5. There's a bit of a liveness problem, because it seems that our 
liveness analysis tends to throw away the temporary too early. I can 
easily hack this problem away by marking all regions that correspond to 
active temporaries as live. I'll see if there's a better solution.

== CXXDefaultArgExpr problems ==

There's a known problem with those. Consider:

   void foo(const C &c = C()) {
   }

   void bar() {
     foo();
     foo();
   }

Each call of foo() contains a CXXDefaultArgExpr for c. The default 
argument value C() is constructed before we enter foo() and destroyed 
after we leave foo(). However, c's initializer, "= C()", is *not part of 
the AST of bar()*. In particular, when foo() is called twice, the 
initializer for the two calls is the same, only CXXDefaultArgExprs are 
different. This screws a lot of invariants in the analyzer: program 
points start coinciding (causing the analysis to loop and cache out), 
Environment doesn't expect the same expression in the same location 
context have two different values (suppose calls are nested into each 
other), analysis taking wrong branches, and so on.

Luckily, default-arg expressions that aren't zero integers or null 
pointers are pretty rare. Still, we'd need to eventually think how to 
avoid any really bad practical problems with them.

On 25/01/2018 9:08 AM, Artem Dergachev wrote:
> Handling C++ temporary object construction and destruction seems to be 
> the biggest cause of false positives on C++ code at the moment. I'd be 
> looking into this, even though for now i don't see the whole scale of 
> problem.
>
> == CFG, destructors, and ProgramPoints ==
>
> We should probably enable `-analyzer-config cfg-temporary-dtors=true` 
> by default soon. It is a fairly low-impact change because it only 
> alters the CFG but the analyzer rarely actually uses the new nodes. 
> Destructors for the most part are still evaluated conservatively, with 
> improper object regions. So it causes almost no changes in the 
> analyzer's positives for now, but it definitely opens up room for 
> further improvements.
>
> I'm observing a couple of problems with this mode at the moment, 
> namely the rich CFG destructor element hierarchy is not currently 
> complemented by an equally rich ProgramPoint hierarchy. This causes 
> the analysis to merge nodes which are not equivalent, for example two 
> implicit destructors of the same type (with the same function 
> declaration) may sometimes cause the ExplodedGraph to coil up and stop 
> analysis (lost coverage) because of having the same program state at 
> the erroneously-same program point. Because situations when we can 
> guarantee a change in the program state are pretty rare, we'd have to 
> produce more program point kinds to handle this correctly.
>
> CallEvent hierarchy is also very much affected in a similar manner - 
> because apparently we're constructing program points by looking at 
> CallEvents, so they'd need to carry all the information that's needed 
> to construct the pre-call/post-call program point.
>
> == Construction contexts ==
>
> We are correctly modeling "this" object region during 
> construction/destruction of variables with automatic storage duration, 
> fields and base classes, and on operator new() since recently, as long 
> as these aren't arrays of objects. It was not yet implemented for 
> other cases such as temporaries, initializer lists, fields or C++17 
> base classes within aggregates, and pass-by-value from/to functions 
> (the latter being a slightly different problem than temporaries).
>
> First of all, by "not yet implemented" i mean that instead of 
> constructing into (destroying) the correct object (in the correct 
> memory region), we come up with a "temporary region", which looks 
> exactly like a region of a valid C++ temporary but is only used for 
> communicating that it is not the right region. Then we look at the 
> region, see that it is a temporary, and avoid inlining constructors, 
> because it would make little sense when the region is not correct. 
> However, if we are to model temporaries, we need another way of 
> communicating our failure to find the correct region, which is being 
> addressed by https://reviews.llvm.org/D42457
>
> Also in the cases when the correct region is used, it is being 
> computed in a funky way: in order to figure out where do we construct 
> the object, we walk forward in the CFG (or from child to parent in the 
> AST) to find the next/parent statement that would accomodate the newly 
> constructed object. This means that the CFG, while perfectly telling 
> us what to do in what order (which we, unlike CodeGen, cannot easily 
> take from AST because we cannot afford recursive evaluation of 
> statements, mainly because of inlining), discards too much context to 
> be helpful in understanding how to do it.
>
> I tried to add the information about such "trigger statements" for 
> constructors into CFG and it is extremely helpful and easy to both use 
> and extend. This assumes adding a new sub-class of CFGElement for 
> constructors, which maintain a "construction context" - for now it's 
> just a trigger statement. For instance, in
>
>   class C { ... };
>   void foo() {
>     C c;
>   }
>
> ...the trigger for constructor C() would be DeclStmt `C c`, and once 
> we know this we can immediately figure out that the construction 
> target is the region of variable `c`. Construction trigger is not 
> necessarily a statement - it may be a CXXCtorInitializer, which is an 
> AST node kind of its own. Additionally, when constructing aggregates 
> via initializer lists, we may have the same statement trigger multiple 
> constructors, eg. in
>
>   class C { public: C(int); ~C(); };
>   struct S { C c1, c2, c3; };
>   void foo() {
>     S s { 1, 2, 3 };
>   }
>
> ... we have three different constructors (actually six different 
> constructors if we include elidable copy constructors) for c1, c2, c3 
> (and lack of constructor for `s` because of the aggregate thing). It 
> would be more natural to declare that the specific field or index 
> would be a part of the CFG's construction context, as well as the 
> intermediate InitListExpr, so even in these simple cases the 
> construction context may get quite bulky. And this example is quite 
> popular on actual code - probably the second worst cause of false 
> positives after temporaries.
>
> For now i have no specific plan on what would construction context for 
> temporaries contain in the general case. I might not be able to get 
> the data structures right from the start. In any case, it might be 
> necessary to perform additional memory allocations for these CFG 
> elements (for analyzer's CFG only, so it wouldn't affect compilation 
> time or warnings).
>
> I believe that getting the correct target region in as many cases as 
> possible would be the main part of my work for the nearest future. And 
> i want to move most this work to CFG, while letting the analyzer pick 
> up the trigger statement from it and trust it as blindly as possible.
>
> == Workflow ==
>
> When working on operator new, i tried hard to maintain a reasonable 
> history, making around 15 local rebases. It was not awesome because it 
> was hard for the reviewers to understand the context of the new 
> changes, and changes could have easily kicked in during rebases. A few 
> lessons learned here would be to commit more aggressively, i.e. 
> avoiding stockpiling a large history of patches (essentially a large 
> branch), which in turn would be possible through trying harder to 
> avoid unrelated hard-to-test core changes (as long as it doesn't 
> require weird workarounds) that aren't covered by a flag (such as 
> those evalCast fixes), in order to make sure reviewing take less 
> responsibility. It's fine if some parts would later be fixed (assuming 
> they would indeed be fixed), if it means making the turnaround faster 
> and the tail of patches shorter - that's also the attitude i'd try to 
> maintain when reviewing stuff myself.