[cfe-dev] [analyzer] Temporaries.

Mon Feb 19 11:18:37 PST 2018

For now i'm not seeing much coverage increase, and i'm not aiming for 
that, though i've already seen a couple of neat temporary-specific new 
true positives.

We avoid the gtest noreturn destructor sink mentioned in 
https://reviews.llvm.org/D42779 but there's not much of these in regular 
code.

Apart from that, i'm expecting a typical skew in coverage due to more 
aggressive inlining, but that's it.

For now it's hard to understand coverage change because there are quite 
a few new false positives hanging around, similar to 
http://lists.llvm.org/pipermail/cfe-dev/2018-February/056909.html

On 19/02/2018 1:51 AM, Aleksei Sidorin wrote:
> Hello Artem,
>
> Thank you for working on this and sharing the status.
> Do we expect coverage changes after allowing destructors to be inlined?
>
> 25.01.2018 20:08, Artem Dergachev via cfe-dev пишет:
>> Handling C++ temporary object construction and destruction seems to 
>> be the biggest cause of false positives on C++ code at the moment. 
>> I'd be looking into this, even though for now i don't see the whole 
>> scale of problem.
>>
>> == CFG, destructors, and ProgramPoints ==
>>
>> We should probably enable `-analyzer-config cfg-temporary-dtors=true` 
>> by default soon. It is a fairly low-impact change because it only 
>> alters the CFG but the analyzer rarely actually uses the new nodes. 
>> Destructors for the most part are still evaluated conservatively, 
>> with improper object regions. So it causes almost no changes in the 
>> analyzer's positives for now, but it definitely opens up room for 
>> further improvements.
>>
>> I'm observing a couple of problems with this mode at the moment, 
>> namely the rich CFG destructor element hierarchy is not currently 
>> complemented by an equally rich ProgramPoint hierarchy. This causes 
>> the analysis to merge nodes which are not equivalent, for example two 
>> implicit destructors of the same type (with the same function 
>> declaration) may sometimes cause the ExplodedGraph to coil up and 
>> stop analysis (lost coverage) because of having the same program 
>> state at the erroneously-same program point. Because situations when 
>> we can guarantee a change in the program state are pretty rare, we'd 
>> have to produce more program point kinds to handle this correctly.
>>
>> CallEvent hierarchy is also very much affected in a similar manner - 
>> because apparently we're constructing program points by looking at 
>> CallEvents, so they'd need to carry all the information that's needed 
>> to construct the pre-call/post-call program point.
>>
>> == Construction contexts ==
>>
>> We are correctly modeling "this" object region during 
>> construction/destruction of variables with automatic storage 
>> duration, fields and base classes, and on operator new() since 
>> recently, as long as these aren't arrays of objects. It was not yet 
>> implemented for other cases such as temporaries, initializer lists, 
>> fields or C++17 base classes within aggregates, and pass-by-value 
>> from/to functions (the latter being a slightly different problem than 
>> temporaries).
>>
>> First of all, by "not yet implemented" i mean that instead of 
>> constructing into (destroying) the correct object (in the correct 
>> memory region), we come up with a "temporary region", which looks 
>> exactly like a region of a valid C++ temporary but is only used for 
>> communicating that it is not the right region. Then we look at the 
>> region, see that it is a temporary, and avoid inlining constructors, 
>> because it would make little sense when the region is not correct. 
>> However, if we are to model temporaries, we need another way of 
>> communicating our failure to find the correct region, which is being 
>> addressed by https://reviews.llvm.org/D42457
>>
>> Also in the cases when the correct region is used, it is being 
>> computed in a funky way: in order to figure out where do we construct 
>> the object, we walk forward in the CFG (or from child to parent in 
>> the AST) to find the next/parent statement that would accomodate the 
>> newly constructed object. This means that the CFG, while perfectly 
>> telling us what to do in what order (which we, unlike CodeGen, cannot 
>> easily take from AST because we cannot afford recursive evaluation of 
>> statements, mainly because of inlining), discards too much context to 
>> be helpful in understanding how to do it.
>>
>> I tried to add the information about such "trigger statements" for 
>> constructors into CFG and it is extremely helpful and easy to both 
>> use and extend. This assumes adding a new sub-class of CFGElement for 
>> constructors, which maintain a "construction context" - for now it's 
>> just a trigger statement. For instance, in
>>
>>   class C { ... };
>>   void foo() {
>>     C c;
>>   }
>>
>> ...the trigger for constructor C() would be DeclStmt `C c`, and once 
>> we know this we can immediately figure out that the construction 
>> target is the region of variable `c`. Construction trigger is not 
>> necessarily a statement - it may be a CXXCtorInitializer, which is an 
>> AST node kind of its own. Additionally, when constructing aggregates 
>> via initializer lists, we may have the same statement trigger 
>> multiple constructors, eg. in
>>
>>   class C { public: C(int); ~C(); };
>>   struct S { C c1, c2, c3; };
>>   void foo() {
>>     S s { 1, 2, 3 };
>>   }
>>
>> ... we have three different constructors (actually six different 
>> constructors if we include elidable copy constructors) for c1, c2, c3 
>> (and lack of constructor for `s` because of the aggregate thing). It 
>> would be more natural to declare that the specific field or index 
>> would be a part of the CFG's construction context, as well as the 
>> intermediate InitListExpr, so even in these simple cases the 
>> construction context may get quite bulky. And this example is quite 
>> popular on actual code - probably the second worst cause of false 
>> positives after temporaries.
>>
>> For now i have no specific plan on what would construction context 
>> for temporaries contain in the general case. I might not be able to 
>> get the data structures right from the start. In any case, it might 
>> be necessary to perform additional memory allocations for these CFG 
>> elements (for analyzer's CFG only, so it wouldn't affect compilation 
>> time or warnings).
>>
>> I believe that getting the correct target region in as many cases as 
>> possible would be the main part of my work for the nearest future. 
>> And i want to move most this work to CFG, while letting the analyzer 
>> pick up the trigger statement from it and trust it as blindly as 
>> possible.
>>
>> == Workflow ==
>>
>> When working on operator new, i tried hard to maintain a reasonable 
>> history, making around 15 local rebases. It was not awesome because 
>> it was hard for the reviewers to understand the context of the new 
>> changes, and changes could have easily kicked in during rebases. A 
>> few lessons learned here would be to commit more aggressively, i.e. 
>> avoiding stockpiling a large history of patches (essentially a large 
>> branch), which in turn would be possible through trying harder to 
>> avoid unrelated hard-to-test core changes (as long as it doesn't 
>> require weird workarounds) that aren't covered by a flag (such as 
>> those evalCast fixes), in order to make sure reviewing take less 
>> responsibility. It's fine if some parts would later be fixed 
>> (assuming they would indeed be fixed), if it means making the 
>> turnaround faster and the tail of patches shorter - that's also the 
>> attitude i'd try to maintain when reviewing stuff myself.
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>