[cfe-dev] [analyzer] Temporaries.
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon Feb 19 11:18:37 PST 2018
For now i'm not seeing much coverage increase, and i'm not aiming for
that, though i've already seen a couple of neat temporary-specific new
true positives.
We avoid the gtest noreturn destructor sink mentioned in
https://reviews.llvm.org/D42779 but there's not much of these in regular
code.
Apart from that, i'm expecting a typical skew in coverage due to more
aggressive inlining, but that's it.
For now it's hard to understand coverage change because there are quite
a few new false positives hanging around, similar to
http://lists.llvm.org/pipermail/cfe-dev/2018-February/056909.html
On 19/02/2018 1:51 AM, Aleksei Sidorin wrote:
> Hello Artem,
>
> Thank you for working on this and sharing the status.
> Do we expect coverage changes after allowing destructors to be inlined?
>
> 25.01.2018 20:08, Artem Dergachev via cfe-dev пишет:
>> Handling C++ temporary object construction and destruction seems to
>> be the biggest cause of false positives on C++ code at the moment.
>> I'd be looking into this, even though for now i don't see the whole
>> scale of problem.
>>
>> == CFG, destructors, and ProgramPoints ==
>>
>> We should probably enable `-analyzer-config cfg-temporary-dtors=true`
>> by default soon. It is a fairly low-impact change because it only
>> alters the CFG but the analyzer rarely actually uses the new nodes.
>> Destructors for the most part are still evaluated conservatively,
>> with improper object regions. So it causes almost no changes in the
>> analyzer's positives for now, but it definitely opens up room for
>> further improvements.
>>
>> I'm observing a couple of problems with this mode at the moment,
>> namely the rich CFG destructor element hierarchy is not currently
>> complemented by an equally rich ProgramPoint hierarchy. This causes
>> the analysis to merge nodes which are not equivalent, for example two
>> implicit destructors of the same type (with the same function
>> declaration) may sometimes cause the ExplodedGraph to coil up and
>> stop analysis (lost coverage) because of having the same program
>> state at the erroneously-same program point. Because situations when
>> we can guarantee a change in the program state are pretty rare, we'd
>> have to produce more program point kinds to handle this correctly.
>>
>> CallEvent hierarchy is also very much affected in a similar manner -
>> because apparently we're constructing program points by looking at
>> CallEvents, so they'd need to carry all the information that's needed
>> to construct the pre-call/post-call program point.
>>
>> == Construction contexts ==
>>
>> We are correctly modeling "this" object region during
>> construction/destruction of variables with automatic storage
>> duration, fields and base classes, and on operator new() since
>> recently, as long as these aren't arrays of objects. It was not yet
>> implemented for other cases such as temporaries, initializer lists,
>> fields or C++17 base classes within aggregates, and pass-by-value
>> from/to functions (the latter being a slightly different problem than
>> temporaries).
>>
>> First of all, by "not yet implemented" i mean that instead of
>> constructing into (destroying) the correct object (in the correct
>> memory region), we come up with a "temporary region", which looks
>> exactly like a region of a valid C++ temporary but is only used for
>> communicating that it is not the right region. Then we look at the
>> region, see that it is a temporary, and avoid inlining constructors,
>> because it would make little sense when the region is not correct.
>> However, if we are to model temporaries, we need another way of
>> communicating our failure to find the correct region, which is being
>> addressed by https://reviews.llvm.org/D42457
>>
>> Also in the cases when the correct region is used, it is being
>> computed in a funky way: in order to figure out where do we construct
>> the object, we walk forward in the CFG (or from child to parent in
>> the AST) to find the next/parent statement that would accomodate the
>> newly constructed object. This means that the CFG, while perfectly
>> telling us what to do in what order (which we, unlike CodeGen, cannot
>> easily take from AST because we cannot afford recursive evaluation of
>> statements, mainly because of inlining), discards too much context to
>> be helpful in understanding how to do it.
>>
>> I tried to add the information about such "trigger statements" for
>> constructors into CFG and it is extremely helpful and easy to both
>> use and extend. This assumes adding a new sub-class of CFGElement for
>> constructors, which maintain a "construction context" - for now it's
>> just a trigger statement. For instance, in
>>
>> class C { ... };
>> void foo() {
>> C c;
>> }
>>
>> ...the trigger for constructor C() would be DeclStmt `C c`, and once
>> we know this we can immediately figure out that the construction
>> target is the region of variable `c`. Construction trigger is not
>> necessarily a statement - it may be a CXXCtorInitializer, which is an
>> AST node kind of its own. Additionally, when constructing aggregates
>> via initializer lists, we may have the same statement trigger
>> multiple constructors, eg. in
>>
>> class C { public: C(int); ~C(); };
>> struct S { C c1, c2, c3; };
>> void foo() {
>> S s { 1, 2, 3 };
>> }
>>
>> ... we have three different constructors (actually six different
>> constructors if we include elidable copy constructors) for c1, c2, c3
>> (and lack of constructor for `s` because of the aggregate thing). It
>> would be more natural to declare that the specific field or index
>> would be a part of the CFG's construction context, as well as the
>> intermediate InitListExpr, so even in these simple cases the
>> construction context may get quite bulky. And this example is quite
>> popular on actual code - probably the second worst cause of false
>> positives after temporaries.
>>
>> For now i have no specific plan on what would construction context
>> for temporaries contain in the general case. I might not be able to
>> get the data structures right from the start. In any case, it might
>> be necessary to perform additional memory allocations for these CFG
>> elements (for analyzer's CFG only, so it wouldn't affect compilation
>> time or warnings).
>>
>> I believe that getting the correct target region in as many cases as
>> possible would be the main part of my work for the nearest future.
>> And i want to move most this work to CFG, while letting the analyzer
>> pick up the trigger statement from it and trust it as blindly as
>> possible.
>>
>> == Workflow ==
>>
>> When working on operator new, i tried hard to maintain a reasonable
>> history, making around 15 local rebases. It was not awesome because
>> it was hard for the reviewers to understand the context of the new
>> changes, and changes could have easily kicked in during rebases. A
>> few lessons learned here would be to commit more aggressively, i.e.
>> avoiding stockpiling a large history of patches (essentially a large
>> branch), which in turn would be possible through trying harder to
>> avoid unrelated hard-to-test core changes (as long as it doesn't
>> require weird workarounds) that aren't covered by a flag (such as
>> those evalCast fixes), in order to make sure reviewing take less
>> responsibility. It's fine if some parts would later be fixed
>> (assuming they would indeed be fixed), if it means making the
>> turnaround faster and the tail of patches shorter - that's also the
>> attitude i'd try to maintain when reviewing stuff myself.
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
More information about the cfe-dev
mailing list