[cfe-dev] [analyzer] Inlining, operator new(), checker callbacks.
Aleksei Sidorin via cfe-dev
cfe-dev at lists.llvm.org
Sat Dec 9 08:15:37 PST 2017
Hello Artem,
Thank you for sharing this.
09.12.2017 02:46, Artem Dergachev via cfe-dev пишет:
> All right, so how do we want to equip C++ operator new() with checker
> callbacks?
>
> ~~~
>
> First of all, a quick note about how do we evaluate call-expressions.
> Typical code to do that in the analyzer (eg.
> ExprEngine::VisitCallExpr) kinda looks like this:
>
> ```
> // Take node `Pred` from the CoreEngine's worklist.
> NodeSet1 = { Pred };
>
> NodeSet2;
> for (N in NodeSet1) {
> // Checkers evaluate check::PreStmt<CallExpr> event, put their
> transitions to NodeSet2.
> runCheckersForPreStmt(N, &NodeSet2);
> }
>
> NodeSet3;
> for (N in NodeSet2) {
> // Checkers evaluate check::PreCall event, put their transitions
> to NodeSet3.
> runCheckersForPreCall(N, &NodeSet3);
> }
>
> NodeSet4;
> for (N in NodeSet3) {
> // Either eval::Call event in Checkers, or inlineCall, or
> conservativeEvalCall.
> evalCall(N, &NodeSet4);
> }
>
> NodeSet5;
> for (N in NodeSet4) {
> // Checkers evaluate check::PostCall event, put their transitions
> to NodeSet5.
> runCheckersForPostCall(N, &NodeSet5);
> }
>
> NodeSet6;
> for (N in NodeSet5) {
> // Checkers evaluate check::PostStmt<CallExpr> event, put their
> transitions to NodeSet6.
> runCheckersForPreStmt(N, &NodeSet6);
> }
>
> // Put nodes from NodeSet6 back to the worklist.
> ```
>
> During evalCall(), if any checker's eval::Call succeeds, that checker
> simply puts their transitions to NodeSet4. During
> conservativeEvalCall, the core puts exactly one new node to NodeSet4.
>
> The interesting part here is that in case of inlineCall(), *the call
> is not actually evaluated* (!). The only thing that happens here is
> that inlineCall enters the stack frame and puts the node with the new
> stack frame *directly to the worklist* (!!), completely bypassing
> NodeSet4. Because of that, in inlineCall case, NodeSet{4,5,6} are all
> empty, and the remaining code does nothing, so no PostCall callbacks
> get called here until the call is actually evaluated. However, when
> the call is fully inlined, at the CallExit program point
> (ExprEngine::processCallExit()), PostCall and PostStmt callbacks are
> called manually. So we get the correct sequence of callbacks regardless.
>
> ~~~
>
> Now, suppose we have operator new:
>
> new (args1...) C(args2...)
>
> The semantics of this code can be expressed in the following
> "statement-expression" pseudo-code:
>
> ```
> 01 C *_this = (C *) operator new(sizeof(C), args1...);
> 02 if (_this) _this->C(args2...);
> 03 return _this;
> ```
>
> Note that the constructor is simply not called when the operator
> returns nullptr.
> Note the cast on the first line - needs to be modeled, because
> operator new returns `void *`.
> Note that i did ask George if he thinks that expressing this piece of
> code as a body-farm-thing is a good idea, and he didn't think so :)
>
> By looking at this construct, it should be obvious that the
> relationship between CXXNewExpr and its respective CXXConstructExpr is
> more complicated than that of a "normal" expression and its
> sub-expression. They are kinda computed in the counter-intuitive
> order, and the easiest way to explain that would be to announce that
> there are actually three "expressions" involved: operator new call
> fake expression, constructor call expression, and the "big
> new-expression" of which both of these are children, in that order.
>
> This is how our CFG currently sees it, in case of `-analyzer-config
> c++-allocator-inlining=true`:
>
> 1. CFGNewAllocator // evaluate `operator new(sizeof(C), args1...)`
> 2. CXXConstructExpr // evaluate `_this->C(args2...)`
> 3. CXXNewExpr // bind `_this` to the expression
>
> So i guess it's so far so good.
>
> At 1., we do defaultEvalCall for operator new. FIXME: do a regular
> evalCall, i.e. allow checkers to evaluate operator new. Also in
> https://reviews.llvm.org/D40560 we add a new program state trait to
> hold the artificial variable "_this", since there's no room for it in
> the Environment, since it's not an expression but something we made
> up. Also we perform the cast from `void *` to `C *`.
>
> At 2., we evalCall for the constructor with the help of our fake
> variable `_this`. FIXME: we should also do the if() part of it, i.e.
> don't evaluate the constructor when the operator returns null. It
> shouldn't be a state split though, i guess we should just suppress the
> branch on which the return value is null, unless we inlined the
> operator new and sure it's null.
>
> At 3., we take `_this` and declare that the value stored in it would
> from now on be the value of the "big new-expression" that unites them
> all, regardless of whether it's null or not.
>
> ~~~
>
> Now when it comes to checker callbacks, it's a bit messy right now. On
> CFGNewAllocator call evaluation, the call site is CXXNewExpr. It means
> that if the operator is inlined, and we hit processCallExit(), as
> explained above, we'd be triggering PostStmt<CXXNewExpr>, even though
> we didn't ever trigger PreStmt<CXXNewExpr>. Then at 3., we'd have
> PreStmt<CXXNewExpr> and then another(!!!) PostStmt<CXXNewExpr>, which
> drives MallocChecker crazy. This, of course, can be trivially fixed.
> But i wanted to write this long explanation to specifically highlight
> this bug, with the hope that in the future it would be less likely to
> get reintroduced.
>
> And then, now that we realize that we have three kinda-statements
> here, the question is, do we want all three equipped with checker
> callbacks? We could plan to make a new callback that'd be surrounding
> CFGNewAllocator similarly to how PreStmt/PostStmt surrounds
> CXXConstructExpr and CXXNewExpr.
>
> It may look like this:
>
> (A)
>
> -> check::PreCXXAllocator // new callback
> -> check::PreCall
> 1. CFGNewAllocator
> <- check::PostCall
> <- check::PostCXXAllocator // new callback
>
> -> check::PreStmt<CXXConstructExpr>
> -> check::PreCall
> 2. CXXConstructExpr
> <- check::PostCall
> <- check::PostStmt<CXXConstructExpr>
>
> -> check::PreStmt<CXXNewExpr>
> 3. CXXNewExpr
> <- check::PostStmt<CXXNewExpr>
>
> Or like this:
>
> (B)
>
> -> check::PreStmt<CXXNewExpr>
>
> -> check::PreCXXAllocator // new callback
> -> check::PreCall
> 1. CFGNewAllocator
> <- check::PostCall
> <- check::PostCXXAllocator // new callback
>
> -> check::PreStmt<CXXConstructExpr>
> -> check::PreCall
> 2. CXXConstructExpr
> <- check::PostCall
> <- check::PostStmt<CXXConstructExpr>
>
> 3. CXXNewExpr
> <- check::PostStmt<CXXNewExpr>
>
> Or like this:
>
> (C)
>
> -> check::PreStmt<CXXNewExpr>
> -> check::PreCall
> 1. CFGNewAllocator
> <- check::PostCall
> <- check::PostStmt<CXXNewExpr>
>
> -> check::PreStmt<CXXConstructExpr>
> -> check::PreCall
> 2. CXXConstructExpr
> <- check::PostCall
> <- check::PostStmt<CXXConstructExpr>
>
> -> check::PreBigNewExpr // new callback (needs better name)
> 3. CXXNewExpr
> <- check::PostBigNewExpr // new callback
>
> Or even like this:
>
> (D)
>
> -> check::PreWholeNewThing // new callback
>
> -> check::PreStmt<CXXNewExpr>
> -> check::PreCall
> 1. CFGNewAllocator
> <- check::PostCall
> <- check::PostStmt<CXXNewExpr>
>
> -> check::PreStmt<CXXConstructExpr>
> -> check::PreCall
> 2. CXXConstructExpr
> <- check::PostCall
> <- check::PostStmt<CXXConstructExpr>
>
> 3. CXXNewExpr
> <- check::PostWholeNewThing // new callback
>
> Or maybe even like this if we want:
>
> (E)
>
> -> check::PreWholeNewThing // new callback (needs better name)
>
> -> check::PreCXXAllocator // another new callback
> -> check::PreCall
> 1. CFGNewAllocator
> <- check::PostCall
> <- check::PostCXXAllocator // new callback
>
> -> check::PreStmt<CXXConstructExpr>
> -> check::PreCall
> 2. CXXConstructExpr
> <- check::PostCall
> <- check::PostStmt<CXXConstructExpr>
>
> -> check::PreStmt<CXXNewExpr>
> 3. CXXNewExpr
> <- check::PostStmt<CXXNewExpr>
>
> -> check::PostWholeNewThing // new callback
>
> Variant (A) is what we commonly do for all other expressions. For
> instance, if we have an expression `a + b`, we only do
> PreStmt<BinaryOperator> *after* evaluation of both `a` and `b`. It
> kind of represents the semantics exactly. However, it would be
> completely counter-intuitive for checker authors to subscribe to
> PreStmt<CXXNewExpr> and find out that by the time their callback
> fires, both operator new() and the constructor(!) have already been
> called. So i expect confusion between CXXNewExpr in our sense (the big
> new-expression) and CXXNewExpr in a common person's sense (the
> allocator call). In particular, it would be nice to move MallocChecker
> to the new callback - eg. we're already done with region extent when
> we're constructing.
(A) is my favourite. It just follows the common logic of the analyzer.
If we don't like the fact that PreStmt<Stmt> is called only after all
sub-statements are evaluated, we should change it all across the
analyzer. (E) os OK too since we are going to have some similar
"envelop" cases like ScopeEnter/ScopeExit; we can call it
OperatorNewEnter/Exit (similar to Scope/Call Enter/Exit we already
have). However, I'd rather wait for developers to share their needs: I
don't think we can predict what developers expect exactly.
>
> Variant (B) tries to go around this confusion while keeping the
> "user-facing meaning" of CXXNewExpr as the big new-expression, but it
> goes against what we normally do, which would anyway be confusing.
>
> In variants (C) and (D) we have the "user-facing meaning" of
> CXXNewExpr changed into "here's where the allocation occurs". This is
> how MallocChecker currently understands things. Like (A), variant (C)
> tries to do what we usually do by only doing Pre-thingy after
> sub-thingies were evaluated (except evaluating CXXNewExpr before its
> child CXXConstructExpr, but we kinda agreed that they're not actually
> parent-child-related). However, because this time Pre-thingy is not a
> PreStmt-thingy, i find variant (D) fancier: the new callback is saying
> "we begin unleashing the operator new hell" and "we're done with the
> operator new hell nice and clean". Still, both (C) and (D) have an
> unpleasant downside of being unable to reliably read the value of
> CXXNewExpr at PostStmt<CXXNewExpr>, as the expression is not yet
> evaluated. This can probably be fixed, but it brings us back to the
> problem of whether we want to keep the value of the fake `_this`
> variable in the Environment as the value of CXXNewExpr this whole
> time. We probably do.
>
> Variant (E) is kinda a combination of all approaches, as it provides
> the "big" callback like (B) and (D), while still saying that
> CXXNewExpr is the big new-expression, while not messing up the usual
> PreStmt/PostStmt order like (B) did. It still has the problem of
> check::PreStmt<CXXNewExpr> being confusingly late, but otherwise seems
> usable.
>
> We can delay the choices between (A) and (E) and also between (C) and
> (D) until we actually want to introduce the new callbacks, but i guess
> it's reasonable to try to agree on what do we want to mean by CXXNewExpr.
>
> I personally feel that (D) is the friendliest approach. Checkers
> subscribe for new - they get operator new. Checkers subscribe for
> constructor - they get their constructor. Checkers want the whole
> thing - sure, here's our fancy custom callback. They may probably
> never even think about this whole hassle.
>
> My second favorite is (E), which has the strength-slash-weakness of
> consistency with the AST's CXXNewExpr being a parent of
> CXXConstructExpr. I believe that the reorder in the analysis order
> compared to the AST, even if it's not an actual reorder, would be
> quite intuitively acceptable. While the positioning of the new
> operator after the constructor wouldn't be.
>
> Then (A) is simply a trimmed-down variant of (E), (B) is just weird,
> and (C) doesn't seem to be anyhow better than (D).
>
> We can also combine (C) and (D), which is even better:
>
> (F)
>
> -> check::PreWholeNewThing // new callback (needs better name)
>
> -> check::PreStmt<CXXNewExpr>
> -> check::PreCall
> 1. CFGNewAllocator
> <- check::PostCall
> <- check::PostStmt<CXXNewExpr>
>
> -> check::PreStmt<CXXConstructExpr>
> -> check::PreCall
> 2. CXXConstructExpr
> <- check::PostCall
> <- check::PostStmt<CXXConstructExpr>
>
> -> check::PreBigNewExpr // another new callback (needs better name)
> 3. CXXNewExpr
> <- check::PostBigNewExpr // new callback
>
> <- check::PostWholeNewThing // new callback
>
> Which by far seems to be the best (not necessarily good) idea i could
> have come up with on this subject.
>
> Thoughts are welcome!~
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
--
Best regards,
Aleksei Sidorin,
SRR, Samsung Electronics
More information about the cfe-dev
mailing list