[cfe-dev] [analyzer] Temporaries.

Tue May 22 17:19:02 PDT 2018

A bit of an update. Previous messages for easier navigation through this
thread:

http://lists.llvm.org/pipermail/cfe-dev/2018-January/056691.html
http://lists.llvm.org/pipermail/cfe-dev/2018-February/056813.html
http://lists.llvm.org/pipermail/cfe-dev/2018-February/056898.html
http://lists.llvm.org/pipermail/cfe-dev/2018-February/056909.html
http://lists.llvm.org/pipermail/cfe-dev/2018-February/056929.html
http://lists.llvm.org/pipermail/cfe-dev/2018-March/057255.html
http://lists.llvm.org/pipermail/cfe-dev/2018-March/057357.html
http://lists.llvm.org/pipermail/cfe-dev/2018-April/057641.html

== Story so far ==

Our AST for C++ constructors is rather unusual: the this-argument of the 
construct-expression is not its sub-expression but is instead scattered 
around its parent expressions. For example, constructor within 'A a(1, 
2, 3)' has AST

DeclStmt
`- VarDecl 'a'
     `- CXXConstructExpr
       |- 1
       |- 2
       `- 3

from which it can be seen that it constructs variable 'a', but there's 
no information about variable 'a' within CXXConstructExpr itself or its 
children. This lead to problems within the analyzer that was unable to 
figure out what object is constructed by looking at the CFG statements 
that go in order of execution (first CXXConstructExpr, then DeclStmt). 
Additionally, this leads to problems with liveness analysis, because the 
constructor ends up putting stuff into a memory region that isn't yet live.

In order to deal with that, i've augmented the CFGElement for the 
construct-expression to provide a ConstructionContext object that 
captures all necessary parent statements. Whenever CFG provides a 
ConstructionContext, it is easy for the analyzer to work correctly. This 
helped us support many new cases of object construction that weren't 
previously supported, including temporary object construction and 
construction into operator new(). This helped us suppress huge amounts 
of false positives, noticeably we can now produce reasonable results on 
WebKit which is a relatively exotic C++ codebase. Still, many more cases 
need to be handled.

== Plans ==

I'll probably focus on adding support for copy elision first. A few 
mails ago i explained how important this feature is, and it is probably 
also the biggest self-contained chunk of work that remains in this realm 
so far. I'll start with a bit of refactoring because some technical debt 
remains - see below :)

== Simple variables and liveness ==

So far i paid relatively little attention to how the simple case

   A a(1, 2, 3);

works. It kidna worked, so i expected it to not need fixing apart from 
the simple refactoring to switch from CFG lookahead to the 
ConstructionContext. However when i looked at 
https://bugs.llvm.org/show_bug.cgi?id=37270 it turned out to be much 
worse than i expected.

The CFG for that would look like this:

   CXXConstructExpr (prvalue expression of type A, construction context 
points to the DeclStmt)
   ... (sometimes some irrelevant implicit stuff in between)
   DeclStmt 'a'

Now let's see. Assume there's a construction context.

(a) The way i expected it to have worked:

   1. CXXConstructExpr is modeled and produces bindings to 'a'. It 
evaluates to LazyCompoundVal of 'a'.
   2. DeclStmt doesn't need to be modeled because bindings are already 
there.

(b) The way it might as well have worked:

   1. CXXConstructExpr is modeled and produces bindings to 'a'. It 
evaluates to LazyCompoundVal of 'a'.
   2. DeclStmt is modeled by binding the value of its initializer, which 
is a LazyCompoundVal of 'a', to 'a'.

(c) The way it actually turned out to work:

   1. CXXConstructExpr is modeled and produces bindings to 'a'. It 
evaluates to MemRegionVal '&a' (wat?).
   2. DeclStmt is modeled by binding the LazyCompoundVal of 'a' that is 
loaded from its initializer '&a' to 'a' (like (b).2 just with a load).

On (c).1 things get really weird because CXXConstructExpr is a prvalue 
expression of object type, and therefore must be modeled as a NonLoc. 
The load on (c).2 doesn't really correspond to any meaningful AST or 
semantics here.

In order to regain some sanity, i tried to see what was wrong with (a) 
and (b). Approach (b) is actually relatively easy to implement, however 
it has a downside of producing a default LazyCompoundVal binding in the 
Store instead of many direct bindings normally produced by an inlined 
constructor. Apart from the performance overhead of looking through an 
extra LazyCompoundVal every time we load from the variable later, we'll 
have analysis precision suffer because RegionStore is bad at modeling 
binding extents. There is a variety of tests in our test suite that 
would degrade because of that.

So it's indeed useful to try to preserve the bindings as they were bound 
by the inlined constructor.

Now, would (a) actually work? NO!!! Because, well, all of a sudden we 
find out that now bindings to 'a' are dead in sense of liveness analysis 
and are removed from the Store. And the fact that they're still present 
within the LazyCompoundVal won't help us on (a).2 because we promised to 
not bind the LazyCompoundVal. Yet it worked just fine in (c) because the 
bindings were kept alive because region '&a' was kept alive because it 
was incorrectly bound to the CXXConstructExpr in the Environment!

Here comes the moment of truth. Construction contexts for operator new 
and temporaries all had their own program state traits to prevent the 
memory region from dying until we leave the construction site. 
Apparently, these construction context kinds *were not special*. 
Essentially, every kind of construction context needs a path-sensitive 
program state trait in order to work.

--

Now, why path-sensitive? In case of temporaries and operator new it's 
kinda obvious, but in case of a variable we could simply declare "please 
don't remove bindings to this variable on that whole sub-tree", right? 
Right. But our construction contexts are imperfect, and, unfortunately, 
step 1 doesn't always work. If a variable is constructed with an 
unsupported construction context on a certain path, we need to use 
method (b).2 as our last resort: if the construction goes into a dummy 
region, we need to take the constructed LazyCompoundValue and bind it to 
our variable.

The current code for that turned out to be outdated as well. Instead of 
looking at whether we've had a construction context, we're looking at 
whether we've had a constructor, via CFG look-behind that's a reverse of 
how previously construction contexts worked through CFG look-ahead, see 
findDirectConstructorForCurrentCFGElement(). The aforementioned bugzilla 
bug is essentially a bug in this approach. We could have fixed it by 
adding extra logic into findDirectConstructorForCurrentCFGElement() to 
skip more statements, but it'd eventually duplicate a lot of logic for 
finding construction contexts. And i'm not sure we'd be able to make it 
work correctly for path-sensitive copy elision, such as C++17 mandatory 
copy elision within the ?: operator, where the constructor may be in a 
different CFG block and we might need to scan the whole CFG to find it.

So that's one more reason to have a path-sensitive program state trait 
even for simple variables: we'll be able to figure out if construction 
target for the variable was modeled precisely. Hopefully we'll be able 
to get rid of that when our construction contexts become precise and 
cover all cases, but for now we can't.

My plan is to get rid of multiple state traits for 
new/destructors/materializations/etc. and make a single program state 
trait ("objects under construction") to rule them all - now that it 
sounds as if we'll always need it anyway. Then use the new trait to fix 
those some constructor modeling. I think the hopefully-minimal run-time 
overhead (which may actually be paid off with the removal of the 
look-behind) is worth it when it comes to maintaining our Loc vs. NonLoc 
invariants, presenting expression values correctly to the checkers, 
keeping our code small and at all possible to understand, and allowing 
us to move further.