[cfe-dev] [analyzer] Temporaries.
Alexey Sidorin via cfe-dev
cfe-dev at lists.llvm.org
Fri Mar 23 15:54:17 PDT 2018
Hi Artem,
I can say that approach 2 is much more clear for me. At least, from the
user's point of view.
22.03.2018 06:45, Artem Dergachev via cfe-dev пишет:
> == Copy elision ==
>
> = Why? =
>
> I've underestimated the importance of implementing copy elision at first.
>
> Eliding copies changes observable behavior of the program if the
> copy/move constructor has side effects. Whether to elide creation of a
> temporary object in certain operations is up to implementation, but
> most implementations currently support and prefer copy elision, so if
> we don't do the same in the analyzer we'd be analyzing a different
> program, and users who don't care about portability may become grumpy
> because they are sure that copy elision is happening.
>
> Additionally, copy elision makes paths longer - including the number
> of diagnostic pieces - which makes reports harder to understand and
> exhausts the analyzer's budget faster.
>
> = How? =
>
> Consider an example:
>
> 1 class C {
> 2 public:
> 3 C(int, int);
> 4 C(const C &);
> 5 ~C();
> 6 };
> 7
> 8 void foo() {
> 9 C c = C(123, 456);
> 10 }
>
> The respective AST:
>
> DeclStmt 0x7fe1d3001ed8 <line:9:3, col:20>
> `-VarDecl 0x7fe1d3001d08 <col:3, col:19> col:5 c 'C' cinit
> `-ExprWithCleanups 0x7fe1d3001ec0 <col:9, col:19> 'C'
> `-CXXConstructExpr 0x7fe1d3001e88 <col:9, col:19> 'C' 'void
> (const C &)' elidable
> `-MaterializeTemporaryExpr 0x7fe1d3001e70 <col:9, col:19>
> 'const C' lvalue
> `-ImplicitCastExpr 0x7fe1d3001e58 <col:9, col:19> 'const
> C' <NoOp>
> `-CXXBindTemporaryExpr 0x7fe1d3001e38 <col:9, col:19>
> 'C' (CXXTemporary 0x7fe1d3001e30)
> `-CXXTemporaryObjectExpr 0x7fe1d3001db8 <col:9,
> col:19> 'C' 'void (int, int)'
> |-IntegerLiteral 0x7fe1d3001d78 <col:11> 'int' 123
> `-IntegerLiteral 0x7fe1d3001d98 <col:16> 'int' 456
>
> And the CFG:
>
> [B1]
> 1: 123
> 2: 456
> 3: C([B1.1], [B1.2]) (CXXConstructExpr, [B1.4], [B1.6], class C)
> 4: [B1.3] (BindTemporary)
> 5: [B1.4] (ImplicitCastExpr, NoOp, const class C)
> 6: [B1.5] (MaterializeTemporaryExpr)
> 7: [B1.6] (CXXConstructExpr, [B1.8], class C)
> 8: C c = C(123, 456);
> 9: ~C() (Temporary object destructor)
> 10: [B1.8].~C() (Implicit destructor)
>
> This explains pretty well what we should do if we don't do copy
> elision: we construct a temporary on B1.3 that's destroyed on B1.9
> after being materialized into an lvalue that's referenced by the
> copy-constructor parameter; the copy-constructor will copy the object
> into the actual variable.
>
> Now, if we are to elide the copy constructor, the original constructor
> C(123, 456) should construct the variable directly, and all statements
> in between (B1.4, B1.5, B1.6, B1.7) and the destructor (B1.9) turn
> into no-op.
>
> I can imagine two ways of implementing such behavior:
>
> (i). Completely remove elided elements B1.4, B1.5, B1.6, B1.7, B1.9
> from the CFG.
> (ii). Keep CFG roughly the same but model elided elements as no-op in
> the analyzer.
>
> I believe that approach (ii) is ultimately better, even though (i)
> sounds very tempting.
>
> Approach (i) assumes that the modified CFG would kinda look like this:
>
> [B1]
> 1: 123
> 2: 456
> 3: C([B1.1], [B1.2]) (CXXConstructExpr, [B1.4], class C)
> 4: C c = C(123, 456);
> 5: [B1.4].~C() (Implicit destructor)
>
> Our CFGConstructor element [B1.3] will essentially be frankensteined
> from *inner* CXXTemporaryObjectExpr 0x7fe1d3001db8 (it's printed as
> CXXConstructExpr in those CFG dumps) and the construction context of
> the *outer* copy-constructor CXXConstructExpr 0x7fe1d3001e88.
>
> This CFG looks pretty good on its own. It's almost indistinguishable
> from the CFG for 'C c(123, 456)' and contains all that's needed to
> properly initialize 'c'.
>
> However the more i think about it, the more it seems to violate the
> principle of least astonishment. For instance,
> getSVal(DeclStmt->getSingleDecl()->getInit(), LC) would yield
> UnknownVal because the constructor that has initialized the variable
> is different from the one that's present in the initializer of the
> variable. The code to obtain the "actual" initializer expression may
> be very complicated because initializers can be quite complex (imagine
> a ?: operator; the code for extracting the actual initializer will
> essentially duplicate the code for finding construction contexts).
>
> Essentially, all of the expressions between the DeclStmt and the
> CXXTemporaryObjectExpr will never be set in the Environment and would
> be unknown. Any user (imagine a checker) who will try to understand
> what is going on in the current program point by exploring values of
> expressions will fail here.
>
> We are going to evaluate our DeclStmt without evaluating its immediate
> sub-expressions. Why would the sub-sub-sub-sub-sub-expression that we
> need to evaluate the DeclStmt be still kept alive in the Environment?
>
> If we try to make checkers' life easier by setting values in the
> Environment anyway for expressions that aren't present in the CFG, how
> would liveness analysis know when to garbage-collect them given this
> analysis is also CFG-based?
>
> In fact I'm quite scared to imagine how our liveness analysis would
> react when he realizes that there are a lot of expressions right in
> the middle of a feasible path that are completely omitted from the
> CFG. I guess we are omitting some dead paths from the CFG, but
> omitting expressions in the middle of a statement will almost
> certainly make liveness analysis more complicated.
>
> Long story short, approach (i) violates a large amount of very basic
> and intuitive invariants that we've so far maintained. Which is scary
> and will make further development harder, even if we manage to make it
> work.
>
> Approach (ii) is much more in the spirit of C++ standard
> [class.copy.elision]:
>
> > "the implementation treats the source and target of the omitted
> copy/move operation as simply two different ways of referring to the
> same object".
>
> Essentially, copy elision is an aliasing relationship between the
> stack variable 'c' and the temporary object within its initializer.
> This makes perfect sense in the CodeGen where both of these are just
> some stack addresses. In analyzer terms, we declare that the
> CXXTempObjectRegion for our CXXTemporaryObjectExpr and the VarRegion
> for our variable 'c' are the same region. Because we don't have
> aliasing / renaming support in the analyzer, we take the route of
> picking our regions correctly from the start (that's, in some sense,
> the whole point of this construction context hassle! - avoiding the
> need to rename regions later - which would have probably been the
> right thing to do even if we could rename regions). It leaves us with
> the following values for the expressions in the Environment:
>
> CXXTemporaryObjectExpr -> lazyCompoundVal{VarRegion 'c', Store after
> construction}
> CXXBindTemporaryExpr -> lazyCompoundVal{VarRegion 'c', Store after
> construction}
> MaterializeTemporaryExpr -> c
> CXXConstructExpr -> lazyCompoundVal{VarRegion 'c', Store after
> construction}
>
> Only CXXTemporaryObjectExpr will be actually modeled (hopefully by
> inlining the constructor), with VarRegion 'c' acting as this-argument.
>
> This leaves us with CXXConstructExpr that constructs into its own
> source argument, which is exactly what the standard expects from us,
> even if it's a bit amusing. It would not construct anything, of
> course, just bind the value. It is questionable whether we need a
> checker callback for the elided constructor (and if we do, which ones
> specifically).
>
> CFG needs to be extended to add more context to the
> CXXTemporaryObjectExpr in order to execute approach (ii) correctly. We
> still need access to the CXXConstructExpr's construction context in
> order to perform the very first construction, so i guess that the
> construction context for CXXTemporaryObjectExpr would be a combination
> of both. Note that there may be more than one elided copy constructor
> (eg. there are two if the initializer is a ?: operator - and, well,
> there may be multiple nested ?: operators).
>
> Additionally, when we model elided expressions, we need to know that
> they are elided by looking at them. We may add this information to
> their CFG elements (this, again, looks tempting, but requires more CFG
> element kinds) or mark expressions as elided in the program state
> (which we can easily do because they are already mentioned in one of
> the construction contexts).
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list