[cfe-dev] [analyzer] Temporaries.
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Thu Jan 25 09:08:11 PST 2018
Handling C++ temporary object construction and destruction seems to be
the biggest cause of false positives on C++ code at the moment. I'd be
looking into this, even though for now i don't see the whole scale of
problem.
== CFG, destructors, and ProgramPoints ==
We should probably enable `-analyzer-config cfg-temporary-dtors=true` by
default soon. It is a fairly low-impact change because it only alters
the CFG but the analyzer rarely actually uses the new nodes. Destructors
for the most part are still evaluated conservatively, with improper
object regions. So it causes almost no changes in the analyzer's
positives for now, but it definitely opens up room for further improvements.
I'm observing a couple of problems with this mode at the moment, namely
the rich CFG destructor element hierarchy is not currently complemented
by an equally rich ProgramPoint hierarchy. This causes the analysis to
merge nodes which are not equivalent, for example two implicit
destructors of the same type (with the same function declaration) may
sometimes cause the ExplodedGraph to coil up and stop analysis (lost
coverage) because of having the same program state at the
erroneously-same program point. Because situations when we can guarantee
a change in the program state are pretty rare, we'd have to produce more
program point kinds to handle this correctly.
CallEvent hierarchy is also very much affected in a similar manner -
because apparently we're constructing program points by looking at
CallEvents, so they'd need to carry all the information that's needed to
construct the pre-call/post-call program point.
== Construction contexts ==
We are correctly modeling "this" object region during
construction/destruction of variables with automatic storage duration,
fields and base classes, and on operator new() since recently, as long
as these aren't arrays of objects. It was not yet implemented for other
cases such as temporaries, initializer lists, fields or C++17 base
classes within aggregates, and pass-by-value from/to functions (the
latter being a slightly different problem than temporaries).
First of all, by "not yet implemented" i mean that instead of
constructing into (destroying) the correct object (in the correct memory
region), we come up with a "temporary region", which looks exactly like
a region of a valid C++ temporary but is only used for communicating
that it is not the right region. Then we look at the region, see that it
is a temporary, and avoid inlining constructors, because it would make
little sense when the region is not correct. However, if we are to model
temporaries, we need another way of communicating our failure to find
the correct region, which is being addressed by
https://reviews.llvm.org/D42457
Also in the cases when the correct region is used, it is being computed
in a funky way: in order to figure out where do we construct the object,
we walk forward in the CFG (or from child to parent in the AST) to find
the next/parent statement that would accomodate the newly constructed
object. This means that the CFG, while perfectly telling us what to do
in what order (which we, unlike CodeGen, cannot easily take from AST
because we cannot afford recursive evaluation of statements, mainly
because of inlining), discards too much context to be helpful in
understanding how to do it.
I tried to add the information about such "trigger statements" for
constructors into CFG and it is extremely helpful and easy to both use
and extend. This assumes adding a new sub-class of CFGElement for
constructors, which maintain a "construction context" - for now it's
just a trigger statement. For instance, in
class C { ... };
void foo() {
C c;
}
...the trigger for constructor C() would be DeclStmt `C c`, and once we
know this we can immediately figure out that the construction target is
the region of variable `c`. Construction trigger is not necessarily a
statement - it may be a CXXCtorInitializer, which is an AST node kind of
its own. Additionally, when constructing aggregates via initializer
lists, we may have the same statement trigger multiple constructors, eg. in
class C { public: C(int); ~C(); };
struct S { C c1, c2, c3; };
void foo() {
S s { 1, 2, 3 };
}
... we have three different constructors (actually six different
constructors if we include elidable copy constructors) for c1, c2, c3
(and lack of constructor for `s` because of the aggregate thing). It
would be more natural to declare that the specific field or index would
be a part of the CFG's construction context, as well as the intermediate
InitListExpr, so even in these simple cases the construction context may
get quite bulky. And this example is quite popular on actual code -
probably the second worst cause of false positives after temporaries.
For now i have no specific plan on what would construction context for
temporaries contain in the general case. I might not be able to get the
data structures right from the start. In any case, it might be necessary
to perform additional memory allocations for these CFG elements (for
analyzer's CFG only, so it wouldn't affect compilation time or warnings).
I believe that getting the correct target region in as many cases as
possible would be the main part of my work for the nearest future. And i
want to move most this work to CFG, while letting the analyzer pick up
the trigger statement from it and trust it as blindly as possible.
== Workflow ==
When working on operator new, i tried hard to maintain a reasonable
history, making around 15 local rebases. It was not awesome because it
was hard for the reviewers to understand the context of the new changes,
and changes could have easily kicked in during rebases. A few lessons
learned here would be to commit more aggressively, i.e. avoiding
stockpiling a large history of patches (essentially a large branch),
which in turn would be possible through trying harder to avoid unrelated
hard-to-test core changes (as long as it doesn't require weird
workarounds) that aren't covered by a flag (such as those evalCast
fixes), in order to make sure reviewing take less responsibility. It's
fine if some parts would later be fixed (assuming they would indeed be
fixed), if it means making the turnaround faster and the tail of patches
shorter - that's also the attitude i'd try to maintain when reviewing
stuff myself.
More information about the cfe-dev
mailing list