[clang] [llvm] [clang-tools-extra] [analyzer] Trust base to derived casts for dynamic types (PR #69057)

Fri Dec 29 06:51:46 PST 2023

steakhal wrote:

> Thanks for the replies. I'll come back to this PR once I have some time; maybe during the holidays. Both assertions directly relate to this PR for sure.

I think I've just fixed the reported crash for this PR. For some reason a `CastExpr` does not have a reference type (`getType()`), even though it casts lvalue expressions - in case casting references instead of pointers). I have no clue why that's the case, but the workaround seems to work now. Added a testcase demonstrating the fix.

---

> Hm, testing these patches on the original testcase in #62663 (the one where we use statements 1B and 2B) - I don't think this patchset solves that scenario...

I've looked into the case and I think now I see your problem.

In case we have `2B` instead of `2A`, we no longer have a cast to hint us the dynamic type of the object; consequently we can't follow/devirtualize the call. So, we are basically forced to conservatively evaluate the call, as if the body is not present.
This is strictly speaking correct, but also not particularly useful.

If I'm not mistaken, inheritance has "open-set" semantics, which means that in our current translation-unit we might not see all the derived classes, and could be that some derived classes are loaded only at runtime using shared libraries fox example.

What it means for us is that, by seeing a `Base*`, which has a pure virtual function, and that only a single class inherits transitively from `Base` (and defines the virtual function); we can't be really sure that by calling that virtual function it will invoke that particular function.

For example, in an other translation unit they might define a `NewDerived` implementing `Base` slightly differently and smuggle a  pointer into our context; breaking local reasoning. Which means that the call to `foo->OnRecvCancel(port)` could potentially invoke a completely different definition that we saw in our translation unit.

If we would have seen the allocation (thus the dynamic type) of the object of which we call functions, it's fairly easy to inline the right definitions. However, when we need to guess what could be the dynamic type, its no longer that easy.

In presence of static-casts, the developer at least hinted a type, so in that case we have some justification for trusting the type there; otherwise all bets are off.

---

What could we do about inlining function definitions without knowing precisely the dynamic type of the object?

I think it would still make sense to assume well architected code and if only one thing derives (implements) a Base class, then it's probably safe to assume that that's the definition we want to inline. However, what should we do if multiple (N) classes implement Base? Trying each N, and basically splitting the state to (N+1) ways is not going to scale. Unless N is of course really small, like 2 or 3 at most.

To conclude, I don't think we could get the `2B` case working (by inlining the `Handler::OnRecvCancel(int)`); but I'll think about it.
Maybe you have some thoughts @Xazax-hun.

https://github.com/llvm/llvm-project/pull/69057