[clang] [analyzer] Widen loops: Augment assignment note when values invalidated at the start of a loop (PR #122398)

Wed Jan 22 06:58:40 PST 2025

steakhal wrote:

> Hi @steakhal .
> 
> Thank you for the further clarification and example. I now understand how your proposal fits in with what I'm trying to accomplish.

Thank you for your time for considering it!

> I've started to implement the proposal but I have a few questions:
> 
> * When regions are invalidated via `getWidenedLoopState` (see the call to ` PrevState->invalidateRegions`), further down the call stack, in `InvalidateRegionsWorker::VisitCluster`, regions are bound to a conjured symbol constructed via `svalBuilder.conjureSymbolVal`. Is `InvalidationArtifact` supposed to wrap the result of `svalBuilder.conjureSymbolVal`, or is it supposed to be a replacement for conjured symbols in the context of invalidation?

Excellent question!
If you look at the [class hierarchy](https://clang.llvm.org/doxygen/classclang_1_1ento_1_1SymExpr.html) of SymExprs, then you will see the `SymbolData` class. Any subclasses of it are equal in the sense that the actual subtype usually doesn't matter.
For instance, as far a checker is concerned, (usually) it shouldn't differentiate and handle differently Conjured symbols and MetadataSymbols, or any other symbols. The important part is that it's a SymbolData.

Now, for us this should mean that an `InvalidationArtifact` is just another subtype of SymbolData, and where in the past (for the lack of it) we used Conjured symbols for representing the value of a region "after" an invalidation, we would instead create an "InvalidationArtifact". While in the past, one could not conclude why we had a Conjured symbol for something, now we could tell if it's representing the value after some invalidation event.

This "refinement" may break existing code that currently relied on the fact that invalidation usually placed a Conjured symbol as the new value, but those could be fairly easily identified and should be even easier to fix: just check if the symbol is an InvalidationArtifact, and there you go. So at this point expect a couple of broken tests.

> * If `InvalidationArtifact` is a replacement for conjured symbols, is it always the case that invalidation results in conjured symbols? From my limited exploration, that seems like the case but I just want to confirm. If so, would it make more sense to have `InvalidationArtifact` be an extension of `SymbolConjured` with the addition of `InvalidationCause`?

Yes,  `InvalidationArtifact` is a replacement (or a better word would be "refinement") of conjured symbols **iff** those were conjured due to some invalidation event. Now, do we always have a conjured symbol for invalidation? Yes and no.
Let's first have a look at what a conjured symbol means. They have a type, let's say `int`. A conjured symbol of such means an arbitrary value within that domain. Even if we don't know anything concrete about such a symbol, we know that they always compare equal against itself. It's like in mathematics. This is not only true for conjured symbols, but to anything that is a SymbolData. (There is one little caviat to the type of a conjured symbol :D They frequently just left as `int` because IDK, they didn't pass the right type all the way, so don't be confused if you see `int` for the type for a conjured symbol that represents the value of IDK, a `char` for instance; funny right?)

If something escapes (and needs invalidation), then the best mental model is that a single write operation is simulated, but then what value should be bound there? You guessed it, we conjure a fresh symbol, and bind it. So, for regular primitive objects, such as `int`, we will end up with a conjured symbol of int.

However, for aggregates (like arrays) and most complicated stuff, we still just conjure a single symbol to the whole object, despite that we usually only operate on a smaller subobject, such as the concrete array element, or a specific field of a struct object.
This is where Derived symbols come into play. (an other sign of that some invalidation happened, besides conjured symbols).
Derived symbols represent a "slice" of another symbol (usually a conjured symbol) to represent the value of a small portion of the original symbol.
There are completely legit, unrelated reasons for having a conjured symbol representing a whole aggregate. For instance the by-value return value of some opaque function call (calling a function like `struct ZZZ getObj()` would return a conjured symbol for the whole `struct ZZZ` object, thus every time you would access a field of that `zzz.field`, a Derived symbol would "slice" the field portion of the whole object "zzz" represented by that conjured symbol). So the presence of derived symbols is not an indicator of some invalidation, just like it wasn't for conjured symbols.

In our new mental model, where we have InvalidationArtifacts, this concept still makes sense. The DerivedSymbol will just wrap an InvalidationArtifact symbol instead of a conjured one. I just wanted to bring awareness to the concept of DerivedSymbols, because back in the day I was really confused by it and it took me a long time to grasp the concept.

Now, the final twist to the story is that in the Store we may not materialize conjured symbols, because if you think about it you don't really need to in a lot of cases.
For example, if you load from a memregion, and you know it's a valid load (so there is a living object behind it), what other value could you load than "some" value of that type? This is exactly what a conjured symbol represents.

I can't exactly recall right now, but as if we had some logic in the past for handling invalidation in the Store that only dropped the bindings of the escaped regions (so that we later don't load an old value), and only rebinds the memory spaces with fresh conjured symbols.

I bet you have a lot to unpack now, so let me know if you have any questions.
I'd rather keep this interactive as I believe that is the most effective way to unblock newcomers.
And after all, you do what I always wanted to see getting implemented, and I'm grateful to see that.

> For the time being, I've implemented a rough first-draft approach here: [main...dgg5503:llvm-project:dgg5503/main/invalidation-diagnostic-update](https://github.com/llvm/llvm-project/compare/main...dgg5503:llvm-project:dgg5503/main/invalidation-diagnostic-update)
> 
> In my draft, I used your code from [main...steakhal:llvm-project:csa-invalidation-artifact](https://github.com/llvm/llvm-project/compare/main...steakhal:llvm-project:csa-invalidation-artifact) as a foundation. I made `SymbolInvalidationArtifact` basically a copy/paste of `SymbolConjured` with the addition of `InvalidationCause` since it seems like conjured symbols are almost always the result of invalidation. Please note there is still much work to be done (i.e. allocating `InvalidationCause` properly, implementing unimplemented functions, getting all LIT tests passing, etc). I just want to make sure I'm on the right path before I invest more time. Please let me know what you think.
> 
> If you'd prefer, I'd be happy to close this PR and open a new one using the new branch I've linked above.

Thanks for the updates. I'll try to spare some time tomorrow on your code.
But please note that last week we had a week long company event, after which I of course got sick, so I still have a hard time to concentrate, but I really wanted to come back to the pending PRs to unblock you and everyone else waiting on me.

Thank you for your patience.

https://github.com/llvm/llvm-project/pull/122398