[cfe-dev] [analyzer] Temporaries.
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Tue May 22 17:19:02 PDT 2018
A bit of an update. Previous messages for easier navigation through this
thread:
http://lists.llvm.org/pipermail/cfe-dev/2018-January/056691.html
http://lists.llvm.org/pipermail/cfe-dev/2018-February/056813.html
http://lists.llvm.org/pipermail/cfe-dev/2018-February/056898.html
http://lists.llvm.org/pipermail/cfe-dev/2018-February/056909.html
http://lists.llvm.org/pipermail/cfe-dev/2018-February/056929.html
http://lists.llvm.org/pipermail/cfe-dev/2018-March/057255.html
http://lists.llvm.org/pipermail/cfe-dev/2018-March/057357.html
http://lists.llvm.org/pipermail/cfe-dev/2018-April/057641.html
== Story so far ==
Our AST for C++ constructors is rather unusual: the this-argument of the
construct-expression is not its sub-expression but is instead scattered
around its parent expressions. For example, constructor within 'A a(1,
2, 3)' has AST
DeclStmt
`- VarDecl 'a'
`- CXXConstructExpr
|- 1
|- 2
`- 3
from which it can be seen that it constructs variable 'a', but there's
no information about variable 'a' within CXXConstructExpr itself or its
children. This lead to problems within the analyzer that was unable to
figure out what object is constructed by looking at the CFG statements
that go in order of execution (first CXXConstructExpr, then DeclStmt).
Additionally, this leads to problems with liveness analysis, because the
constructor ends up putting stuff into a memory region that isn't yet live.
In order to deal with that, i've augmented the CFGElement for the
construct-expression to provide a ConstructionContext object that
captures all necessary parent statements. Whenever CFG provides a
ConstructionContext, it is easy for the analyzer to work correctly. This
helped us support many new cases of object construction that weren't
previously supported, including temporary object construction and
construction into operator new(). This helped us suppress huge amounts
of false positives, noticeably we can now produce reasonable results on
WebKit which is a relatively exotic C++ codebase. Still, many more cases
need to be handled.
== Plans ==
I'll probably focus on adding support for copy elision first. A few
mails ago i explained how important this feature is, and it is probably
also the biggest self-contained chunk of work that remains in this realm
so far. I'll start with a bit of refactoring because some technical debt
remains - see below :)
== Simple variables and liveness ==
So far i paid relatively little attention to how the simple case
A a(1, 2, 3);
works. It kidna worked, so i expected it to not need fixing apart from
the simple refactoring to switch from CFG lookahead to the
ConstructionContext. However when i looked at
https://bugs.llvm.org/show_bug.cgi?id=37270 it turned out to be much
worse than i expected.
The CFG for that would look like this:
CXXConstructExpr (prvalue expression of type A, construction context
points to the DeclStmt)
... (sometimes some irrelevant implicit stuff in between)
DeclStmt 'a'
Now let's see. Assume there's a construction context.
(a) The way i expected it to have worked:
1. CXXConstructExpr is modeled and produces bindings to 'a'. It
evaluates to LazyCompoundVal of 'a'.
2. DeclStmt doesn't need to be modeled because bindings are already
there.
(b) The way it might as well have worked:
1. CXXConstructExpr is modeled and produces bindings to 'a'. It
evaluates to LazyCompoundVal of 'a'.
2. DeclStmt is modeled by binding the value of its initializer, which
is a LazyCompoundVal of 'a', to 'a'.
(c) The way it actually turned out to work:
1. CXXConstructExpr is modeled and produces bindings to 'a'. It
evaluates to MemRegionVal '&a' (wat?).
2. DeclStmt is modeled by binding the LazyCompoundVal of 'a' that is
loaded from its initializer '&a' to 'a' (like (b).2 just with a load).
On (c).1 things get really weird because CXXConstructExpr is a prvalue
expression of object type, and therefore must be modeled as a NonLoc.
The load on (c).2 doesn't really correspond to any meaningful AST or
semantics here.
In order to regain some sanity, i tried to see what was wrong with (a)
and (b). Approach (b) is actually relatively easy to implement, however
it has a downside of producing a default LazyCompoundVal binding in the
Store instead of many direct bindings normally produced by an inlined
constructor. Apart from the performance overhead of looking through an
extra LazyCompoundVal every time we load from the variable later, we'll
have analysis precision suffer because RegionStore is bad at modeling
binding extents. There is a variety of tests in our test suite that
would degrade because of that.
So it's indeed useful to try to preserve the bindings as they were bound
by the inlined constructor.
Now, would (a) actually work? NO!!! Because, well, all of a sudden we
find out that now bindings to 'a' are dead in sense of liveness analysis
and are removed from the Store. And the fact that they're still present
within the LazyCompoundVal won't help us on (a).2 because we promised to
not bind the LazyCompoundVal. Yet it worked just fine in (c) because the
bindings were kept alive because region '&a' was kept alive because it
was incorrectly bound to the CXXConstructExpr in the Environment!
Here comes the moment of truth. Construction contexts for operator new
and temporaries all had their own program state traits to prevent the
memory region from dying until we leave the construction site.
Apparently, these construction context kinds *were not special*.
Essentially, every kind of construction context needs a path-sensitive
program state trait in order to work.
--
Now, why path-sensitive? In case of temporaries and operator new it's
kinda obvious, but in case of a variable we could simply declare "please
don't remove bindings to this variable on that whole sub-tree", right?
Right. But our construction contexts are imperfect, and, unfortunately,
step 1 doesn't always work. If a variable is constructed with an
unsupported construction context on a certain path, we need to use
method (b).2 as our last resort: if the construction goes into a dummy
region, we need to take the constructed LazyCompoundValue and bind it to
our variable.
The current code for that turned out to be outdated as well. Instead of
looking at whether we've had a construction context, we're looking at
whether we've had a constructor, via CFG look-behind that's a reverse of
how previously construction contexts worked through CFG look-ahead, see
findDirectConstructorForCurrentCFGElement(). The aforementioned bugzilla
bug is essentially a bug in this approach. We could have fixed it by
adding extra logic into findDirectConstructorForCurrentCFGElement() to
skip more statements, but it'd eventually duplicate a lot of logic for
finding construction contexts. And i'm not sure we'd be able to make it
work correctly for path-sensitive copy elision, such as C++17 mandatory
copy elision within the ?: operator, where the constructor may be in a
different CFG block and we might need to scan the whole CFG to find it.
So that's one more reason to have a path-sensitive program state trait
even for simple variables: we'll be able to figure out if construction
target for the variable was modeled precisely. Hopefully we'll be able
to get rid of that when our construction contexts become precise and
cover all cases, but for now we can't.
My plan is to get rid of multiple state traits for
new/destructors/materializations/etc. and make a single program state
trait ("objects under construction") to rule them all - now that it
sounds as if we'll always need it anyway. Then use the new trait to fix
those some constructor modeling. I think the hopefully-minimal run-time
overhead (which may actually be paid off with the removal of the
look-behind) is worth it when it comes to maintaining our Loc vs. NonLoc
invariants, presenting expression values correctly to the checkers,
keeping our code small and at all possible to understand, and allowing
us to move further.
More information about the cfe-dev
mailing list