[cfe-dev] [analyzer] Temporaries.

Thu Jan 25 09:08:11 PST 2018

Handling C++ temporary object construction and destruction seems to be 
the biggest cause of false positives on C++ code at the moment. I'd be 
looking into this, even though for now i don't see the whole scale of 
problem.

== CFG, destructors, and ProgramPoints ==

We should probably enable `-analyzer-config cfg-temporary-dtors=true` by 
default soon. It is a fairly low-impact change because it only alters 
the CFG but the analyzer rarely actually uses the new nodes. Destructors 
for the most part are still evaluated conservatively, with improper 
object regions. So it causes almost no changes in the analyzer's 
positives for now, but it definitely opens up room for further improvements.

I'm observing a couple of problems with this mode at the moment, namely 
the rich CFG destructor element hierarchy is not currently complemented 
by an equally rich ProgramPoint hierarchy. This causes the analysis to 
merge nodes which are not equivalent, for example two implicit 
destructors of the same type (with the same function declaration) may 
sometimes cause the ExplodedGraph to coil up and stop analysis (lost 
coverage) because of having the same program state at the 
erroneously-same program point. Because situations when we can guarantee 
a change in the program state are pretty rare, we'd have to produce more 
program point kinds to handle this correctly.

CallEvent hierarchy is also very much affected in a similar manner - 
because apparently we're constructing program points by looking at 
CallEvents, so they'd need to carry all the information that's needed to 
construct the pre-call/post-call program point.

== Construction contexts ==

We are correctly modeling "this" object region during 
construction/destruction of variables with automatic storage duration, 
fields and base classes, and on operator new() since recently, as long 
as these aren't arrays of objects. It was not yet implemented for other 
cases such as temporaries, initializer lists, fields or C++17 base 
classes within aggregates, and pass-by-value from/to functions (the 
latter being a slightly different problem than temporaries).

First of all, by "not yet implemented" i mean that instead of 
constructing into (destroying) the correct object (in the correct memory 
region), we come up with a "temporary region", which looks exactly like 
a region of a valid C++ temporary but is only used for communicating 
that it is not the right region. Then we look at the region, see that it 
is a temporary, and avoid inlining constructors, because it would make 
little sense when the region is not correct. However, if we are to model 
temporaries, we need another way of communicating our failure to find 
the correct region, which is being addressed by 
https://reviews.llvm.org/D42457

Also in the cases when the correct region is used, it is being computed 
in a funky way: in order to figure out where do we construct the object, 
we walk forward in the CFG (or from child to parent in the AST) to find 
the next/parent statement that would accomodate the newly constructed 
object. This means that the CFG, while perfectly telling us what to do 
in what order (which we, unlike CodeGen, cannot easily take from AST 
because we cannot afford recursive evaluation of statements, mainly 
because of inlining), discards too much context to be helpful in 
understanding how to do it.

I tried to add the information about such "trigger statements" for 
constructors into CFG and it is extremely helpful and easy to both use 
and extend. This assumes adding a new sub-class of CFGElement for 
constructors, which maintain a "construction context" - for now it's 
just a trigger statement. For instance, in

   class C { ... };
   void foo() {
     C c;
   }

...the trigger for constructor C() would be DeclStmt `C c`, and once we 
know this we can immediately figure out that the construction target is 
the region of variable `c`. Construction trigger is not necessarily a 
statement - it may be a CXXCtorInitializer, which is an AST node kind of 
its own. Additionally, when constructing aggregates via initializer 
lists, we may have the same statement trigger multiple constructors, eg. in

   class C { public: C(int); ~C(); };
   struct S { C c1, c2, c3; };
   void foo() {
     S s { 1, 2, 3 };
   }

... we have three different constructors (actually six different 
constructors if we include elidable copy constructors) for c1, c2, c3 
(and lack of constructor for `s` because of the aggregate thing). It 
would be more natural to declare that the specific field or index would 
be a part of the CFG's construction context, as well as the intermediate 
InitListExpr, so even in these simple cases the construction context may 
get quite bulky. And this example is quite popular on actual code - 
probably the second worst cause of false positives after temporaries.

For now i have no specific plan on what would construction context for 
temporaries contain in the general case. I might not be able to get the 
data structures right from the start. In any case, it might be necessary 
to perform additional memory allocations for these CFG elements (for 
analyzer's CFG only, so it wouldn't affect compilation time or warnings).

I believe that getting the correct target region in as many cases as 
possible would be the main part of my work for the nearest future. And i 
want to move most this work to CFG, while letting the analyzer pick up 
the trigger statement from it and trust it as blindly as possible.

== Workflow ==

When working on operator new, i tried hard to maintain a reasonable 
history, making around 15 local rebases. It was not awesome because it 
was hard for the reviewers to understand the context of the new changes, 
and changes could have easily kicked in during rebases. A few lessons 
learned here would be to commit more aggressively, i.e. avoiding 
stockpiling a large history of patches (essentially a large branch), 
which in turn would be possible through trying harder to avoid unrelated 
hard-to-test core changes (as long as it doesn't require weird 
workarounds) that aren't covered by a flag (such as those evalCast 
fixes), in order to make sure reviewing take less responsibility. It's 
fine if some parts would later be fixed (assuming they would indeed be 
fixed), if it means making the turnaround faster and the tail of patches 
shorter - that's also the attitude i'd try to maintain when reviewing 
stuff myself.