[cfe-dev] [analyzer] Temporaries.

Fri Mar 23 15:54:17 PDT 2018

Hi Artem,

I can say that approach 2 is much more clear for me. At least, from the 
user's point of view.

22.03.2018 06:45, Artem Dergachev via cfe-dev пишет:
> == Copy elision ==
>
> = Why? =
>
> I've underestimated the importance of implementing copy elision at first.
>
> Eliding copies changes observable behavior of the program if the 
> copy/move constructor has side effects. Whether to elide creation of a 
> temporary object in certain operations is up to implementation, but 
> most implementations currently support and prefer copy elision, so if 
> we don't do the same in the analyzer we'd be analyzing a different 
> program, and users who don't care about portability may become grumpy 
> because they are sure that copy elision is happening.
>
> Additionally, copy elision makes paths longer - including the number 
> of diagnostic pieces - which makes reports harder to understand and 
> exhausts the analyzer's budget faster.
>
> = How? =
>
> Consider an example:
>
>      1    class C {
>      2    public:
>      3      C(int, int);
>      4      C(const C &);
>      5      ~C();
>      6    };
>      7
>      8    void foo() {
>      9      C c = C(123, 456);
>     10    }
>
> The respective AST:
>
>       DeclStmt 0x7fe1d3001ed8 <line:9:3, col:20>
>       `-VarDecl 0x7fe1d3001d08 <col:3, col:19> col:5 c 'C' cinit
>         `-ExprWithCleanups 0x7fe1d3001ec0 <col:9, col:19> 'C'
>           `-CXXConstructExpr 0x7fe1d3001e88 <col:9, col:19> 'C' 'void 
> (const C &)' elidable
>             `-MaterializeTemporaryExpr 0x7fe1d3001e70 <col:9, col:19> 
> 'const C' lvalue
>               `-ImplicitCastExpr 0x7fe1d3001e58 <col:9, col:19> 'const 
> C' <NoOp>
>                 `-CXXBindTemporaryExpr 0x7fe1d3001e38 <col:9, col:19> 
> 'C' (CXXTemporary 0x7fe1d3001e30)
>                   `-CXXTemporaryObjectExpr 0x7fe1d3001db8 <col:9, 
> col:19> 'C' 'void (int, int)'
>                     |-IntegerLiteral 0x7fe1d3001d78 <col:11> 'int' 123
>                     `-IntegerLiteral 0x7fe1d3001d98 <col:16> 'int' 456
>
> And the CFG:
>
> [B1]
>    1: 123
>    2: 456
>    3: C([B1.1], [B1.2]) (CXXConstructExpr, [B1.4], [B1.6], class C)
>    4: [B1.3] (BindTemporary)
>    5: [B1.4] (ImplicitCastExpr, NoOp, const class C)
>    6: [B1.5] (MaterializeTemporaryExpr)
>    7: [B1.6] (CXXConstructExpr, [B1.8], class C)
>    8: C c = C(123, 456);
>    9: ~C() (Temporary object destructor)
>   10: [B1.8].~C() (Implicit destructor)
>
> This explains pretty well what we should do if we don't do copy 
> elision: we construct a temporary on B1.3 that's destroyed on B1.9 
> after being materialized into an lvalue that's referenced by the 
> copy-constructor parameter; the copy-constructor will copy the object 
> into the actual variable.
>
> Now, if we are to elide the copy constructor, the original constructor 
> C(123, 456) should construct the variable directly, and all statements 
> in between (B1.4, B1.5, B1.6, B1.7) and the destructor (B1.9) turn 
> into no-op.
>
> I can imagine two ways of implementing such behavior:
>
> (i). Completely remove elided elements B1.4, B1.5, B1.6, B1.7, B1.9 
> from the CFG.
> (ii). Keep CFG roughly the same but model elided elements as no-op in 
> the analyzer.
>
> I believe that approach (ii) is ultimately better, even though (i) 
> sounds very tempting.
>
> Approach (i) assumes that the modified CFG would kinda look like this:
>
> [B1]
>    1: 123
>    2: 456
>    3: C([B1.1], [B1.2]) (CXXConstructExpr, [B1.4], class C)
>    4: C c = C(123, 456);
>    5: [B1.4].~C() (Implicit destructor)
>
> Our CFGConstructor element [B1.3] will essentially be frankensteined 
> from *inner* CXXTemporaryObjectExpr 0x7fe1d3001db8 (it's printed as 
> CXXConstructExpr in those CFG dumps) and the construction context of 
> the *outer* copy-constructor CXXConstructExpr 0x7fe1d3001e88.
>
> This CFG looks pretty good on its own. It's almost indistinguishable 
> from the CFG for 'C c(123, 456)' and contains all that's needed to 
> properly initialize 'c'.
>
> However the more i think about it, the more it seems to violate the 
> principle of least astonishment. For instance, 
> getSVal(DeclStmt->getSingleDecl()->getInit(), LC) would yield 
> UnknownVal because the constructor that has initialized the variable 
> is different from the one that's present in the initializer of the 
> variable. The code to obtain the "actual" initializer expression may 
> be very complicated because initializers can be quite complex (imagine 
> a ?: operator; the code for extracting the actual initializer will 
> essentially duplicate the code for finding construction contexts).
>
> Essentially, all of the expressions between the DeclStmt and the 
> CXXTemporaryObjectExpr will never be set in the Environment and would 
> be unknown. Any user (imagine a checker) who will try to understand 
> what is going on in the current program point by exploring values of 
> expressions will fail here.
>
> We are going to evaluate our DeclStmt without evaluating its immediate 
> sub-expressions. Why would the sub-sub-sub-sub-sub-expression that we 
> need to evaluate the DeclStmt be still kept alive in the Environment?
>
> If we try to make checkers' life easier by setting values in the 
> Environment anyway for expressions that aren't present in the CFG, how 
> would liveness analysis know when to garbage-collect them given this 
> analysis is also CFG-based?
>
> In fact I'm quite scared to imagine how our liveness analysis would 
> react when he realizes that there are a lot of expressions right in 
> the middle of a feasible path that are completely omitted from the 
> CFG. I guess we are omitting some dead paths from the CFG, but 
> omitting expressions in the middle of a statement will almost 
> certainly make liveness analysis more complicated.
>
> Long story short, approach (i) violates a large amount of very basic 
> and intuitive invariants that we've so far maintained. Which is scary 
> and will make further development harder, even if we manage to make it 
> work.
>
> Approach (ii) is much more in the spirit of C++ standard 
> [class.copy.elision]:
>
> > "the implementation treats the source and target of the omitted 
> copy/move operation as simply two different ways of referring to the 
> same object".
>
> Essentially, copy elision is an aliasing relationship between the 
> stack variable 'c' and the temporary object within its initializer. 
> This makes perfect sense in the CodeGen where both of these are just 
> some stack addresses. In analyzer terms, we declare that the 
> CXXTempObjectRegion for our CXXTemporaryObjectExpr and the VarRegion 
> for our variable 'c' are the same region. Because we don't have 
> aliasing / renaming support in the analyzer, we take the route of 
> picking our regions correctly from the start (that's, in some sense, 
> the whole point of this construction context hassle! - avoiding the 
> need to rename regions later - which would have probably been the 
> right thing to do even if we could rename regions). It leaves us with 
> the following values for the expressions in the Environment:
>
> CXXTemporaryObjectExpr -> lazyCompoundVal{VarRegion 'c', Store after 
> construction}
> CXXBindTemporaryExpr -> lazyCompoundVal{VarRegion 'c', Store after 
> construction}
> MaterializeTemporaryExpr -> c
> CXXConstructExpr -> lazyCompoundVal{VarRegion 'c', Store after 
> construction}
>
> Only CXXTemporaryObjectExpr will be actually modeled (hopefully by 
> inlining the constructor), with VarRegion 'c' acting as this-argument.
>
> This leaves us with CXXConstructExpr that constructs into its own 
> source argument, which is exactly what the standard expects from us, 
> even if it's a bit amusing. It would not construct anything, of 
> course, just bind the value. It is questionable whether we need a 
> checker callback for the elided constructor (and if we do, which ones 
> specifically).
>
> CFG needs to be extended to add more context to the 
> CXXTemporaryObjectExpr in order to execute approach (ii) correctly. We 
> still need access to the CXXConstructExpr's construction context in 
> order to perform the very first construction, so i guess that the 
> construction context for CXXTemporaryObjectExpr would be a combination 
> of both. Note that there may be more than one elided copy constructor 
> (eg. there are two if the initializer is a ?: operator - and, well, 
> there may be multiple nested ?: operators).
>
> Additionally, when we model elided expressions, we need to know that 
> they are elided by looking at them. We may add this information to 
> their CFG elements (this, again, looks tempting, but requires more CFG 
> element kinds) or mark expressions as elided in the program state 
> (which we can easily do because they are already mentioned in one of 
> the construction contexts).
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev