[cfe-dev] [analyzer] Temporaries.

Gábor Horváth via cfe-dev cfe-dev at lists.llvm.org
Thu Apr 12 14:04:46 PDT 2018


On 12 April 2018 at 22:50, Artem Dergachev <noqnoqneo at gmail.com> wrote:

>
>
> On 4/12/18 1:10 PM, Gábor Horváth wrote:
>
>
>
> On 14 February 2018 at 22:20, Artem Dergachev via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> More explanations on how the analyzer keeps making its way around the C++
>> AST.
>>
>> == Lifetime extension ==
>>
>> This is a brain dump of how (and how much) lifetime extension of
>> temporary objects is currently broken in the static analyzert. Spoilers:
>> not too much, it seems, but annoying nevertheless.
>>
>> Consider an example:
>>
>>      1    class C {
>>      2    public:
>>      3      C() {}
>>      4      ~C() {}
>>      5    };
>>      6
>>      7    void foo() {
>>      8      const C &c = C();
>>      9    }
>>
>> With the AST for the variable declaration:
>>
>>       DeclStmt 0x7fa5ac85cba0 <line:8:3, col:19>
>>       `-VarDecl 0x7fa5ac85c878 <col:3, col:18> col:12 c 'const C &' cinit
>>         `-ExprWithCleanups 0x7fa5ac85cb30 <col:16, col:18> 'const C'
>> lvalue
>>           `-MaterializeTemporaryExpr 0x7fa5ac85cb00 <col:16, col:18>
>> 'const C' lvalue extended by Var 0x7fa5ac85c878 'c' 'const C &'
>>             `-ImplicitCastExpr 0x7fa5ac85cae8 <col:16, col:18> 'const C'
>> <NoOp>
>>               `-CXXBindTemporaryExpr 0x7fa5ac85cac8 <col:16, col:18> 'C'
>> (CXXTemporary 0x7fa5ac85cac0)
>>                 `-CXXTemporaryObjectExpr 0x7fa5ac85ca88 <col:16, col:18>
>> 'C' 'void ()'
>>
>> *here goes a periodic reminder that CXXTemporaryObjectExpr is a sub-class
>> of CXXConstructExpr*
>>
>> Notice how MaterializeTemporaryExpr is the innermost expression (the
>> first one in the order of execution) that is an lvalue. Essentially, you
>> can think of it as the mythical "rvalue-to-lvalue" cast that takes in a C++
>> object rvalue and returns the this-value for that object. Because all C++
>> objects have a this-value that never changes throughout their lifetime, it
>> is essentially their identity. Otherwise you can't call methods on them.
>>
>> MaterializeTemporaryExpr also contains information about the lifetime
>> extension process: we needed the this-value in order to bind it to variable
>> 'c'. You see that in the AST.
>>
>> In the analyzer, however, MaterializeTemporaryExpr does a different
>> thing, as a temporary solution (no pun intended). It constructs a new
>> temporary region out of thin air and binds the rvalue object to that
>> temporary in the Store. The respective function in our code is called
>> "createTemporaryRegionIfNeeded". It also has a separate purpose of
>> converting rvalue sub-object adjustments into lvalue sub-object
>> adjustments, which we wouldn't discuss this time.
>>
>> Now that we learned how to inline temporary constructors and destructors,
>> it essentially means that the this-value in the constructor and in the
>> destructor would be different. Because first we construct the object into
>> temporary region R1, then we take lazyCompoundVal{R1} to represent the
>> value of CXXTemporaryObjectExpr, then we materialize lazyCompoundVal{R1} to
>> R2, then we bind R2 to variable 'c', then we call the automatic(!)
>> destructor for 'c' which contains R2. To be clear, the region store at the
>> time of destruction would be:
>>
>>   c: R2,
>>   R2: lazyCompoundVal{R1}.
>>
>> It means that fields of the object would contain the correct values,
>> there would be the correct number of destructors called (no temporary
>> destructors, just one automatic destructor), but the identity of the object
>> (this-value) would change in the process. Unless the object actually makes
>> any decisions or does any manipulations that involve its this-value, the
>> modeling should be correct. When the object starts to actively use its
>> this-value in its inlined methods, the analysis would do wrong stuff.
>> Additionally, it causes a mess in the checkers when they try to track
>> objects by their this-values - i.e. IteratorChecker has a lot of additional
>> code to work around the fact that the region for the object constantly
>> changes.
>>
>> From the above it is clear that MaterializeTemporaryExpr should not
>> construct any new regions, at least not in this case. We already have the
>> correct region, namely R1, which should be re-used.
>>
>
> Hi Artem!
>
> We found a strange false positive that might be related to what you
> describe above but not sure though. Could you take a look?
> It looks like we are seeing a null pointer dereference error, and the null
> value comes from the destructor which was invoked on a temporary.
> If this is not the case, the path diagnostic might be misleading.
>
> Here is the finding: http://cc.elte.hu:15010/Default/#run=Xerces_Xerces-C_
> 3_2_1_unexplored_first&checker-name=unix.MismatchedDeallocator&checker-
> name=cplusplus.NewDeleteLeaks&checker-name=core.uninitialized.UndefReturn&
> checker-name=core.UndefinedBinaryOperatorResult&checker-name=core.
> NullDereference&checker-name=core.NonNullParamChecker&
> checker-name=core.DivideZero&checker-name=core.
> CallAndMessage&tab=allReports&reportHash=2f8548f84329734e674f15d844caeb
> 38&report=6764&subtab=2f8548f84329734e674f15d844caeb38
>
> Regards,
> Gábor
>
>
> Hmm, i don't fully understand it yet. It seems to me that
> RefHash3KeysIdPoolEnumerator drains the whole pool upon destruction (fIdPtrs
> is not a member of the temporary, it's an external entity), so it indeed
> can't really be safely passed around or copied (i.e. elidable copies are
> not moves).
>

Oh, I think you are right! Somehow I assumed the there is a deep copy going
on but after looking at the copy constructor, it looks like that is not the
case. Sorry for bothering :)


>
> So i'd easily believe that this code only works because all copies are
> elided, i.e. the code is not portable (at least not until C++17) but
> usually works because most compilers are clever enough. We'll probably get
> rid of such positives when we implement copy elision.
>

> In case i got it all wrong, do you accidentally have a reproducer for me
> to have a closer look? If CTU stuff is hard to provide a reproducer for,
> you should be able to write a small code snippet directly into
> RefHash3KeysIdPool.c to reproduce the problem. I'll also accept a trimmed
> (as in -trim-egraph) exploded graph dot file even if it's huge.
>
> P.S. The "Calling..." icon is hilarious :D
>
>
>
>>
>> It is tempting to take R1 directly from lazyCompoundVal{R1} - it already
>> has memories about once being a value of R1. I'm not sure it's not going to
>> work - it may work, at least i'm not ready to come up with a
>> counterexample. But the meaning of LazyCompoundVal's parent region is
>> different and coincides with what we want just accidentally. Namely,
>> lazyCompoundVal{R1} is a value of an object that was bound to R1 in some
>> particular moment of time in the past, without any explanation of when this
>> moment of time was - but there's no indication if R1 is the region of the
>> temporary we've just constructed, or a region of an unrelated object that
>> used to have the same value back then. As we'd see later,
>> MaterializeTemporaryExpr doesn't always contain a constructor within it -
>> there are a lot of cases to cover, and if the trick doesn't work even once,
>> it's going to be hard, so i'd probably not going to commit into maintaining
>> this invariant. Though it might be plausible to modify add an SVal kind
>> that does exactly what we mean here - i.e. a value that does indeed
>> correspond to a specific C++ object identified by region. It might be a
>> beautiful solution, but it may also fail miserably if tricky cornercases
>> show up - so i'm not ready to commit to that. Also the correct way of
>> dealing with such values (i.e. for how long are they relevant?) would be
>> extremely difficult to explain to checker developers.
>>
>> The more straightforward approach here is to include
>> MaterializeTemporaryExpr (hereinafter MTE) into the construction context.
>> It means, if a temporary that we're about to construct would be
>> lifetime-extended later, we'd rather know about that during construction,
>> and maintain a map in the program state from MTE to their respective
>> temporary regions that were used for representing the respective
>> construction targets. Upon encountering the respective MTE we'd simply
>> retrieve the implicit temporary storage for the value from the program
>> state and declare that temporary region to be the value of the MTE. This
>> would mimic the approach we have with CXXBindTemporaryExprs (hereinafter
>> BTE) and their respective regions that allows temporary destructors to work
>> - but this time it's going to be about MaterializeTemporaryExprs and
>> automatic destructors. I imagine that on the checker side this can
>> potentially be exposed via some sort of State->getTemporaryStorage(Expr)
>> call, but i believe that generally this process should be as transparent to
>> the checkers as possible.
>>
>> It sounds as if both of these maps could be eliminated by always making
>> sure that the target temporary is constructed "with" the MTE (for
>> lifetime-extended temproraries) or BTE (for temporaries that require
>> destruction at the end of full-expression). In this case, with the help of
>> construction context-assisted lookahead, we declare that the target of the
>> construction is CXXTempObjectRegion(MTE, LC) or CXXTempObjectRegion(BTE,
>> LC) respectively, rather than CXXTempObjectRegion(CXXConstructExpr).
>> Then during evaluation of MTE or BTE we'd simply construct the same region
>> with the expression we're currently evaluating, and it's automagically
>> going to be the correct region. This approach, however, becomes confusing
>> when we start dealing with elidable constructors (explained below). So for
>> now i believe that it is quite irrelevant which expression is identifying
>> the temporary region.
>>
>> == Elidable constructors ==
>>
>> While it doesn't sound like an immediately important task to implement
>> copy elision in the analyzer, it may help with making some things easier.
>> And it'd also make some reports fancier, as mentioned in
>> https://reviews.llvm.org/D43144.
>>
>> Elidable copy-constructors can be explained as a form of lifetime
>> extension. Instead of copying the temporary, they keep using the original
>> value of the temporary, which in some pretty twisted sense means that they
>> are lifetime-extending it to be able to use it. For example, if we modify
>> our example by replacing the lifetime-extending reference variable with a
>> value-type variable:
>>
>>      1    class C {
>>      2    public:
>>      3      C() {}
>>      4      ~C() {}
>>      5    };
>>      6
>>      7    void foo() {
>>      8      C c = C();
>>      9    }
>>
>> ...then we'd still have an MTE, even though lifetime extension would seem
>> to be gone:
>>
>>       DeclStmt 0x7fb8f005afb8 <line:8:3, col:12>
>>       `-VarDecl 0x7fb8f005ac50 <col:3, col:11> col:5 c 'C' cinit
>>         `-ExprWithCleanups 0x7fb8f005afa0 <col:9, col:11> 'C'
>>           `-CXXConstructExpr 0x7fb8f005af68 <col:9, col:11> 'C' 'void
>> (const C &) noexcept' elidable
>>             `-MaterializeTemporaryExpr 0x7fb8f005af00 <col:9, col:11>
>> 'const C' lvalue
>>               `-ImplicitCastExpr 0x7fb8f005aee8 <col:9, col:11> 'const C'
>> <NoOp>
>>                 `-CXXBindTemporaryExpr 0x7fb8f005aec8 <col:9, col:11> 'C'
>> (CXXTemporary 0x7fb8f005aec0)
>>                   `-CXXTemporaryObjectExpr 0x7fb8f005ae88 <col:9, col:11>
>> 'C' 'void ()'
>>
>> In this case the MTE is expressing the fact that the temporary
>> constructed via CXXTemporaryObjectExpr can be "lifetime-extended" (by
>> actually merging it with the stack variable) rather than copied, if the
>> CXXConstructExpr surrounding it would be chosen to be elided. The AST
>> doesn't make the elision choice for us - but is compatible with both
>> choices. The MTE essentially overrides the necessity of immediate
>> destruction provided by the BTE, and lets the surrounding AST decide upon
>> the lifetime of the object.
>>
>> Because the analyzer currently does not do copy elision, it will use the
>> MTE only to compute the argument for the elidable copy-constructor, and
>> then perform the copy-construction, and then destroy the original temporary
>> at the end of the full-expression. Note, however, that in this case we need
>> to properly support both the BTE (for the temporary destructor to work) and
>> the MTE (for computing its value). We need to implement the MTE's ability
>> to perform "rvalue-to-lvalue-cast" even if the temporary destruction is
>> still necessary. For this reason, if we rely on constructing temporary
>> regions with the correct BTEs or MTEs, at least one of these tasks becomes
>> impossible to perform.
>>
>> If we were to support copy elision, then the CXXTemporaryObjectExpr
>> constructor would go directly into the variable region. For the purposes of
>> modeling, it'd mean that only CXXTemporaryObjectExpr would actually need to
>> be modeled. But this would require additional coding in the construction
>> context to be able to realize that the target is the variable while
>> modeling the CXXTemporaryObjectExpr.
>>
>> For the sake of completeness, let's consider the ternary operator example:
>>
>>      1    class C {
>>      2    public:
>>      3      C(int) {}
>>      4      ~C() {}
>>      5    };
>>      6
>>      7    void foo(int coin) {
>>      8      const C &c = coin ? C(1) : C(2);
>>      9    }
>>
>> The respective AST would be:
>>
>>       DeclStmt 0x7fc1e20023e0 <line:8:3, col:34>
>>       `-VarDecl 0x7fc1e2001dc8 <col:3, col:33> col:12 c 'const C &' cinit
>>         `-ExprWithCleanups 0x7fc1e2002370 <col:16, col:33> 'const C'
>> lvalue
>>           `-MaterializeTemporaryExpr 0x7fc1e2002340 <col:16, col:33>
>> 'const C' lvalue extended by Var 0x7fc1e2001dc8 'c' 'const C &'
>>             `-ImplicitCastExpr 0x7fc1e2002328 <col:16, col:33> 'const C'
>> <NoOp>
>>               `-ConditionalOperator 0x7fc1e20022f8 <col:16, col:33> 'C'
>>                 |-ImplicitCastExpr 0x7fc1e2002170 <col:16> 'bool'
>> <IntegralToBoolean>
>>                 | `-ImplicitCastExpr 0x7fc1e2002158 <col:16> 'int'
>> <LValueToRValue>
>>                 |   `-DeclRefExpr 0x7fc1e2001e28 <col:16> 'int' lvalue
>> ParmVar 0x7fc1e2001c18 'coin' 'int'
>>                 |-CXXBindTemporaryExpr 0x7fc1e2002248 <col:23, col:26>
>> 'C' (CXXTemporary 0x7fc1e2002240)
>>                 | `-CXXConstructExpr 0x7fc1e2002208 <col:23, col:26> 'C'
>> 'void (const C &) noexcept' elidable
>>                 |   `-MaterializeTemporaryExpr 0x7fc1e20021a0 <col:23,
>> col:26> 'const C' lvalue
>>                 |     `-ImplicitCastExpr 0x7fc1e2002188 <col:23, col:26>
>> 'const C' <NoOp>
>>                 |       `-CXXFunctionalCastExpr 0x7fc1e2002078 <col:23,
>> col:26> 'C' functional cast to class C <ConstructorConversion>
>>                 |         `-CXXBindTemporaryExpr 0x7fc1e2002058 <col:23,
>> col:26> 'C' (CXXTemporary 0x7fc1e2002050)
>>                 |           `-CXXConstructExpr 0x7fc1e2002018 <col:23,
>> col:26> 'C' 'void (int)'
>>                 |             `-IntegerLiteral 0x7fc1e2001e60 <col:25>
>> 'int' 1
>>                 `-CXXBindTemporaryExpr 0x7fc1e20022d8 <col:30, col:33>
>> 'C' (CXXTemporary 0x7fc1e20022d0)
>>                   `-CXXConstructExpr 0x7fc1e2002298 <col:30, col:33> 'C'
>> 'void (const C &) noexcept' elidable
>>                     `-MaterializeTemporaryExpr 0x7fc1e2002280 <col:30,
>> col:33> 'const C' lvalue
>>                       `-ImplicitCastExpr 0x7fc1e2002268 <col:30, col:33>
>> 'const C' <NoOp>
>>                         `-CXXFunctionalCastExpr 0x7fc1e2002130 <col:30,
>> col:33> 'C' functional cast to class C <ConstructorConversion>
>>                           `-CXXBindTemporaryExpr 0x7fc1e2002110 <col:30,
>> col:33> 'C' (CXXTemporary 0x7fc1e2002108)
>>                             `-CXXConstructExpr 0x7fc1e20020d0 <col:30,
>> col:33> 'C' 'void (int)'
>>                               `-IntegerLiteral 0x7fc1e20020b0 <col:32>
>> 'int' 2
>>
>> Each branch contains two constructors: the temporary and the elidable
>> copy. The temporaries are surrounded with their respective BTEs and
>> copy-elision-kind MTEs, which indicates that they need to be either
>> destroyed as temporaries, or, if copy elision is chosen, have their
>> lifetime decided upon by the surrounding AST. The elidable copy
>> constructors also, being temporaries, have their respective BTEs. Note,
>> however, that there is only one MTE for both BTEs for the elidable
>> constructors.
>>
>> So after the conditional operator is resolved (which is the first thing
>> we need to do, according to the CFG), we'd go ahead and perform the
>> constructors, and their trigger would be their respective BTE in the
>> non-elide case, and the single top-level MTE in the elide case. In the
>> non-elide case, copy constructors would be triggered by the top-level MTE.
>>
>> It means that, once again, copy elision would prevent us from handling
>> both the BTE and the copy-elision-kind MTE in the single construction,
>> allowing the "predictable target region" trick to work: when we need the
>> temporary destructor, we construct directly into CXXTempObjectRegion of the
>> BTE and it gets automatically picked up during destruction, and when we
>> need the automatic destructor, we construct directly into
>> CXXTempObjectRegion of the MTE and we can easily compute the value of the
>> MTE. But when we don't do copy elision, we'd have to keep at least one of
>> those in the program state map.
>>
>> == Return by value ==
>>
>> Returning C++ objects by value is actually very similar to constructing
>> it. Consider:
>>
>>      1    class C {
>>      2    public:
>>      3      C() {}
>>      4      ~C() {}
>>      5    };
>>      6
>>      7    C bar() {
>>      8      C c;
>>      9      return c;
>>     10    }
>>     11
>>     12    void foo() {
>>     13      const C &c = bar();
>>     14    }
>>
>> With the respective AST for DeclStmt in foo():
>>
>>       DeclStmt 0x7fe62c84f8e8 <line:13:3, col:21>
>>       `-VarDecl 0x7fe62c84f6b8 <col:3, col:20> col:12 c 'const C &' cinit
>>         `-ExprWithCleanups 0x7fe62c84f878 <col:16, col:20> 'const C'
>> lvalue
>>           `-MaterializeTemporaryExpr 0x7fe62c84f848 <col:16, col:20>
>> 'const C' lvalue extended by Var 0x7fe62c84f6b8 'c' 'const C &'
>>             `-ImplicitCastExpr 0x7fe62c84f830 <col:16, col:20> 'const C'
>> <NoOp>
>>               `-CXXBindTemporaryExpr 0x7fe62c84f810 <col:16, col:20> 'C'
>> (CXXTemporary 0x7fe62c84f808)
>>                 `-CallExpr 0x7fe62c84f7e0 <col:16, col:20> 'C'
>>                   `-ImplicitCastExpr 0x7fe62c84f7c8 <col:16> 'C (*)()'
>> <FunctionToPointerDecay>
>>                     `-DeclRefExpr 0x7fe62c84f770 <col:16> 'C ()' lvalue
>> Function 0x7fe62c84f190 'bar' 'C ()'
>>
>> And for the ReturnStmt in bar():
>>
>>       ReturnStmt 0x7fe62c84f5b0 <line:9:3, col:10>
>>       `-CXXConstructExpr 0x7fe62c84f578 <col:10> 'C' 'void (const C &)
>> noexcept' elidable
>>         `-ImplicitCastExpr 0x7fe62c84f518 <col:10> 'const C' lvalue <NoOp>
>>           `-DeclRefExpr 0x7fe62c84f4f0 <col:10> 'C' lvalue Var
>> 0x7fe62c84f280 'c' 'C'
>>
>> Since https://reviews.llvm.org/D42875 we can already realize that the
>> constructor in bar(), assuming that we're inlining bar() during analysis,
>> would be constructed into something that is a return value of bar(). This
>> allows us, by looking that the StackFrameContext's call site, to figure out
>> that it is being constructed into the CallExpr in foo(). Now if only we
>> knew that that the call site is a lifetime-extended temporary, i.e. if only
>> we had a pointer to the foo()'s MTE at the CallExpr's CFG element, we'd be
>> able to find the correct target region for construction: the
>> CXXTempObjectRegion for the MTE in the StackFrameContext of foo(). So i'm
>> proposing to add some sort of construction context to not only
>> constructors, but also to functions that return objects, and then during
>> construction perform the lookup in three easy steps:
>>
>>   1. in the callee's CFG from constructor to return statement,
>>   2. through the location from the return statement to the call site,
>>   3. then through the caller's CFG from the call site to the MTE.
>>
>> If the function is not inlined, we can still make use of the construction
>> context to represent the return value as a LazyCompoundValue of the MTE's
>> temporary. It would eliminate the need to replace the return value with
>> another value while evaluating the MTE, and of course the need to re-bind
>> the object to a different this-region.
>>
>> So i believe that this is a good way to eliminate the need for the
>> "createTemporaryRegionIfNeeded" thing in the function calls as well.
>>
>>
>>
>> On 06/02/2018 1:41 PM, Artem Dergachev wrote:
>>
>>> A bit of an update.
>>>
>>> == Temporary destructors ==
>>>
>>> Adding some initial support for temporary destructors seems pretty easy
>>> and straightforward at this point, given the awesome work done by Manuel
>>> Klimek on our CFG a few years ago.
>>>
>>> 1. We already have the awesome CFGTemporaryDtor elements, which have the
>>> backreference to the CXXBindTemporaryExpr for their temporaries.
>>>
>>> 2. In simple cases CXXBindTemporaryExprs have an immediate constructor
>>> within them, and therefore we can provide the CXXBindTemporaryExprs as the
>>> construction context's trigger statements, and therefore have a reliable
>>> CXXTempObjectRegion for constructors.
>>>
>>> 3. Then we already track the CXXBindTemporaryExprs for the active
>>> temporaries in the program state. We can attach any constructor information
>>> to them, such as the target region, if we need to (probably we can
>>> reconstruct it by looking at the expression and the location context, not
>>> sure if we want to).
>>>
>>> 4. So when we reach the CFGTemporaryDtor element, we can just lookup all
>>> the info we need, and perform the destruction properly.
>>>
>>> 5. There's a bit of a liveness problem, because it seems that our
>>> liveness analysis tends to throw away the temporary too early. I can easily
>>> hack this problem away by marking all regions that correspond to active
>>> temporaries as live. I'll see if there's a better solution.
>>>
>>> == CXXDefaultArgExpr problems ==
>>>
>>> There's a known problem with those. Consider:
>>>
>>>   void foo(const C &c = C()) {
>>>   }
>>>
>>>   void bar() {
>>>     foo();
>>>     foo();
>>>   }
>>>
>>> Each call of foo() contains a CXXDefaultArgExpr for c. The default
>>> argument value C() is constructed before we enter foo() and destroyed after
>>> we leave foo(). However, c's initializer, "= C()", is *not part of the AST
>>> of bar()*. In particular, when foo() is called twice, the initializer for
>>> the two calls is the same, only CXXDefaultArgExprs are different. This
>>> screws a lot of invariants in the analyzer: program points start coinciding
>>> (causing the analysis to loop and cache out), Environment doesn't expect
>>> the same expression in the same location context have two different values
>>> (suppose calls are nested into each other), analysis taking wrong branches,
>>> and so on.
>>>
>>> Luckily, default-arg expressions that aren't zero integers or null
>>> pointers are pretty rare. Still, we'd need to eventually think how to avoid
>>> any really bad practical problems with them.
>>>
>>> On 25/01/2018 9:08 AM, Artem Dergachev wrote:
>>>
>>>> Handling C++ temporary object construction and destruction seems to be
>>>> the biggest cause of false positives on C++ code at the moment. I'd be
>>>> looking into this, even though for now i don't see the whole scale of
>>>> problem.
>>>>
>>>> == CFG, destructors, and ProgramPoints ==
>>>>
>>>> We should probably enable `-analyzer-config cfg-temporary-dtors=true`
>>>> by default soon. It is a fairly low-impact change because it only alters
>>>> the CFG but the analyzer rarely actually uses the new nodes. Destructors
>>>> for the most part are still evaluated conservatively, with improper object
>>>> regions. So it causes almost no changes in the analyzer's positives for
>>>> now, but it definitely opens up room for further improvements.
>>>>
>>>> I'm observing a couple of problems with this mode at the moment, namely
>>>> the rich CFG destructor element hierarchy is not currently complemented by
>>>> an equally rich ProgramPoint hierarchy. This causes the analysis to merge
>>>> nodes which are not equivalent, for example two implicit destructors of the
>>>> same type (with the same function declaration) may sometimes cause the
>>>> ExplodedGraph to coil up and stop analysis (lost coverage) because of
>>>> having the same program state at the erroneously-same program point.
>>>> Because situations when we can guarantee a change in the program state are
>>>> pretty rare, we'd have to produce more program point kinds to handle this
>>>> correctly.
>>>>
>>>> CallEvent hierarchy is also very much affected in a similar manner -
>>>> because apparently we're constructing program points by looking at
>>>> CallEvents, so they'd need to carry all the information that's needed to
>>>> construct the pre-call/post-call program point.
>>>>
>>>> == Construction contexts ==
>>>>
>>>> We are correctly modeling "this" object region during
>>>> construction/destruction of variables with automatic storage duration,
>>>> fields and base classes, and on operator new() since recently, as long as
>>>> these aren't arrays of objects. It was not yet implemented for other cases
>>>> such as temporaries, initializer lists, fields or C++17 base classes within
>>>> aggregates, and pass-by-value from/to functions (the latter being a
>>>> slightly different problem than temporaries).
>>>>
>>>> First of all, by "not yet implemented" i mean that instead of
>>>> constructing into (destroying) the correct object (in the correct memory
>>>> region), we come up with a "temporary region", which looks exactly like a
>>>> region of a valid C++ temporary but is only used for communicating that it
>>>> is not the right region. Then we look at the region, see that it is a
>>>> temporary, and avoid inlining constructors, because it would make little
>>>> sense when the region is not correct. However, if we are to model
>>>> temporaries, we need another way of communicating our failure to find the
>>>> correct region, which is being addressed by
>>>> https://reviews.llvm.org/D42457
>>>>
>>>> Also in the cases when the correct region is used, it is being computed
>>>> in a funky way: in order to figure out where do we construct the object, we
>>>> walk forward in the CFG (or from child to parent in the AST) to find the
>>>> next/parent statement that would accomodate the newly constructed object.
>>>> This means that the CFG, while perfectly telling us what to do in what
>>>> order (which we, unlike CodeGen, cannot easily take from AST because we
>>>> cannot afford recursive evaluation of statements, mainly because of
>>>> inlining), discards too much context to be helpful in understanding how to
>>>> do it.
>>>>
>>>> I tried to add the information about such "trigger statements" for
>>>> constructors into CFG and it is extremely helpful and easy to both use and
>>>> extend. This assumes adding a new sub-class of CFGElement for constructors,
>>>> which maintain a "construction context" - for now it's just a trigger
>>>> statement. For instance, in
>>>>
>>>>   class C { ... };
>>>>   void foo() {
>>>>     C c;
>>>>   }
>>>>
>>>> ...the trigger for constructor C() would be DeclStmt `C c`, and once we
>>>> know this we can immediately figure out that the construction target is the
>>>> region of variable `c`. Construction trigger is not necessarily a statement
>>>> - it may be a CXXCtorInitializer, which is an AST node kind of its own.
>>>> Additionally, when constructing aggregates via initializer lists, we may
>>>> have the same statement trigger multiple constructors, eg. in
>>>>
>>>>   class C { public: C(int); ~C(); };
>>>>   struct S { C c1, c2, c3; };
>>>>   void foo() {
>>>>     S s { 1, 2, 3 };
>>>>   }
>>>>
>>>> ... we have three different constructors (actually six different
>>>> constructors if we include elidable copy constructors) for c1, c2, c3 (and
>>>> lack of constructor for `s` because of the aggregate thing). It would be
>>>> more natural to declare that the specific field or index would be a part of
>>>> the CFG's construction context, as well as the intermediate InitListExpr,
>>>> so even in these simple cases the construction context may get quite bulky.
>>>> And this example is quite popular on actual code - probably the second
>>>> worst cause of false positives after temporaries.
>>>>
>>>> For now i have no specific plan on what would construction context for
>>>> temporaries contain in the general case. I might not be able to get the
>>>> data structures right from the start. In any case, it might be necessary to
>>>> perform additional memory allocations for these CFG elements (for
>>>> analyzer's CFG only, so it wouldn't affect compilation time or warnings).
>>>>
>>>> I believe that getting the correct target region in as many cases as
>>>> possible would be the main part of my work for the nearest future. And i
>>>> want to move most this work to CFG, while letting the analyzer pick up the
>>>> trigger statement from it and trust it as blindly as possible.
>>>>
>>>> == Workflow ==
>>>>
>>>> When working on operator new, i tried hard to maintain a reasonable
>>>> history, making around 15 local rebases. It was not awesome because it was
>>>> hard for the reviewers to understand the context of the new changes, and
>>>> changes could have easily kicked in during rebases. A few lessons learned
>>>> here would be to commit more aggressively, i.e. avoiding stockpiling a
>>>> large history of patches (essentially a large branch), which in turn would
>>>> be possible through trying harder to avoid unrelated hard-to-test core
>>>> changes (as long as it doesn't require weird workarounds) that aren't
>>>> covered by a flag (such as those evalCast fixes), in order to make sure
>>>> reviewing take less responsibility. It's fine if some parts would later be
>>>> fixed (assuming they would indeed be fixed), if it means making the
>>>> turnaround faster and the tail of patches shorter - that's also the
>>>> attitude i'd try to maintain when reviewing stuff myself.
>>>>
>>>
>>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20180412/d74573ca/attachment-0001.html>


More information about the cfe-dev mailing list