[clang] [clang][dataflow] Handle CXXInheritedCtorInitExpr in ResultObjectVisitor. (PR #99616)

Tue Jul 30 20:02:04 PDT 2024

haoNoQ wrote:

> Thanks for the info -- I wasn't aware of `ConstructionContext`. Will take a look. Can you point me to some examples of how this is used?

The initial motivation was here: https://github.com/llvm/llvm-project/blob/llvmorg-19-init/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp#L130 - this is the place where the clang static analyzer's path-sensitive engine looks at the `CXXConstructExpr` in the CFG and utilizes the additional information to foresee the storage in which the prvalue is about to be constructed. So that when the constructor call is modeled, we knew what the `this` value is. Which, yeah, naturally allows us to map every field of the object to the field in the storage, if necessary. Or just to model the value of `this` if the code ever tries to use it directly.

Additionally, it allows us to keep track of that storage location for the purposes of future events (see the next function at https://github.com/llvm/llvm-project/blob/llvmorg-19-init/clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp#L408 where we write the evaluated `MemRegion *` back into our abstract state). This is particularly useful for temporary destructors in cases like this:
```
foo(x ? MyClass(1, 2, 3) : MyClass(4, 5, 6));
```
where the CFG looks like a double diamond: we may construct one of the two objects, we may destroy one of the two objects, but we merge in the middle in order to call `foo()` unconditionally. So you need to figure out at runtime which object gets destroyed. And by the time we reach the destructor at the end of the full-expression (basically `ExprWithCleanups`) we already forgot which of these two objects we needed to destroy. There's no longer any expression that evaluates to the location of that object at this point, so if we don't keep track of it separately, we don't remember it at all. Or, when `foo()` accepts the object as `const MyClass &`, some branches don't even require a temporary destructor at all, but the CFG is still the same double diamond:
```
MyClass existing_object(4, 5, 6);
foo(x ? MyClass(1, 2, 3) : existing_object);
```
Thanks to Manuel Klimek's hard work, the magic CFG terminator at the beginning of the bottom diamond (which corresponds to the runtime branch responsible for conditionally invoking the destructor) remembers a `CXXBindTemporaryExpr *` that points to the exact object that we need to destroy. Because this `CXXBindTemporaryExpr` is foretold by the `ConstructionContext`, we can write it down in the abstract state at construction time and then make the decision on the magic terminator based on the contents of the abstract state. (Another way to organize this would be to write down this info in the abstract state slightly later, when we reach the actual `CXXBindTemporaryExpr`. But in our case we'd probably still need to write down the exact `MemRegion`, so it'd still require the extra annotations in the CFG to compute that region, so we might as well do that at construction where these annotations have already been made available available. Especially given that we needed to foretell the respective `MaterializeTemporaryExpr` too, and then possibly copy elision on top of it, which could be another reason why a temporary destructor may suddenly become unnecessary.)

So, yeah, the construction context helps us discover facts about the program such as "if `x` is false then this invocation of `foo()` doesn't cause `~MyClass()` to be invoked". Which is something you can probably discover in simple cases by simply looking at the AST in a top-down manner ("if it never constructs the object then it never destroys it, makes sense right?") but if you're doing a principled one-step-at-a-time analysis over the CFG then the ability to "foretell" the destructor makes your life much easier, simplifies the code by a lot.

I made a tiny conference talk about this (the second half of https://www.youtube.com/watch?v=4n3l-ZcDJNY&t=692s). The interesting part is probably where I enumerate all the ~30 cases I've found, that produce substantially different AST, which probably require a different `ConstructionContext` subclass for each of them, to properly foretell everything we needed it to foretell in each case:
<img width="1052" alt="Screenshot 2024-07-30 at 7 21 27 PM" src="https://github.com/user-attachments/assets/4dbf46b9-a6da-45d7-85c9-8235e8c4ded6">
(A lot of these also produce completely different ASTs before and after C++17.)

I'm very open to changes in `ConstructionContext` to accommodate your needs, like provide extra information that we didn't need but you did, or provide it in a different way. It's probably not dramatically different from what you're doing (i.e. your `RecordStorageLocation` appears to be very similar, albeit less abstract), but I suspect that it's slightly more mature at this point, and I'm fairly confident in the design:
* `ConstructionContext` looks flexible enough to handle even the most obscure cases, assuming that the tool knows how to interpret it;
* I still think that it's a very good idea to separate construction contexts into different subclasses because in many cases the tool needs to handle them differently even when the information they carry may be the same... in particular, a lot of the things wouldn't out work for us if we tried to reduce `ConstructionContext` to a single `Stmt *`;
* In particular, I still think that there's a finite amount of `ConstructionContext`s in the language: even though you're allowed to put an arbitrary amount of `?:` operators between the `CXXConstructExpr` and its respective `MaterializeTemporaryExpr`, we're still able cover all of these patterns with the same `ConstructionContext` subclass that carries a fixed amount of AST pointers.

https://github.com/llvm/llvm-project/pull/99616