[PATCH] D80366: [Analyzer] Add `getReturnValueUnderConstruction()` to `CallEvent`
Balogh, Ádám via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Wed Jun 3 07:06:12 PDT 2020
baloghadamsoftware marked 4 inline comments as done.
baloghadamsoftware added inline comments.
================
Comment at: clang/lib/StaticAnalyzer/Core/CallEvent.cpp:543
+
+ const StackFrameContext *StackFrame = Call.getCalleeStackFrame(BlockCount);
+ if (!StackFrame)
----------------
NoQ wrote:
> It is clear that the final return value of `getConstructionContext()` does not depend on the current block count; in fact the information you're trying to obtain is purely syntactic. You probably might as well pass 0 here. As i mentioned before, the ideal solution would be to make `CallEvent` carry a `CFGElementRef` to begin with (as a replacement for the origin expr), so that it didn't need to be recovered like this.
>
> We should absolutely eliminate the `BlockCount` parameter of `getReturnValueUnderConstruction()` as well. Checker developers shouldn't understand what it is, especially given that it doesn't matter at all which value do you pass.
OK, I eliminated it. Do we need `BlockCount` for `getParameterLocation()` (the other patch)?
================
Comment at: clang/lib/StaticAnalyzer/Core/ExprEngineCXX.cpp:112
+Optional<SVal> ExprEngine::retrieveFromConstructionContext(
+ ProgramStateRef State, const LocationContext *LCtx,
----------------
NoQ wrote:
> baloghadamsoftware wrote:
> > martong wrote:
> > > baloghadamsoftware wrote:
> > > > NoQ wrote:
> > > > > balazske wrote:
> > > > > > baloghadamsoftware wrote:
> > > > > > > baloghadamsoftware wrote:
> > > > > > > > baloghadamsoftware wrote:
> > > > > > > > > baloghadamsoftware wrote:
> > > > > > > > > > NoQ wrote:
> > > > > > > > > > > baloghadamsoftware wrote:
> > > > > > > > > > > > NoQ wrote:
> > > > > > > > > > > > > Please instead re-use the code that computes the object under construction. That'll save you ~50 lines of code and will be more future-proof (eg., standalone temporaries without destructor technically have a construction context with 0 items so when we implement them correctly your procedure will stop working).
> > > > > > > > > > > > That was so my first thought. However, `handleConstructionContext()` is private and non-static. Now I tried to merge the two methods: if the value is already in the construction context, we return it, if not then we add it. Is this what you suggest? Or did I misunderstand you? At the very beginning I tried to simply use `handleConstructionContext()`, but it asserted because the value was already in the map.
> > > > > > > > > > > I'd like to preserve the assertion that objects-under-construction are never filled twice; it's a very useful sanity check. What you need in your checker is a function that computes object-under-construction but doesn't put it into the objects-under-construction map. So you have to separate the computation from filling in the state.
> > > > > > > > > > OK, so I (fortunately) misundertood you. Thus I should refactor this function to a calculation and a storing part?
> > > > > > > > > OK, I see what you are speaking about, but I have no idea how to do it properly. The problem is that the control logic of filling in the state also depends on the kind of the construction context. For some kinds we do not fill at all. Every way I try it becomes more complex and less correct:
> > > > > > > > >
> > > > > > > > > 1) `NewAllocatedObjectKind`: we do not add this to the state, we only retrieve the original.
> > > > > > > > > 2) `SimpleReturnedValueKind` and `CXX17ElidedCopyReturnedValueKind`: depending on whether we are in top frame we handle this case recursively or we do not fill at all, just return the value. What is the construction context item here? Maybe the `ReturnStmt`?
> > > > > > > > > 3) `ElidedTemporaryObjectKind`: this is the most problematic: we first handle it recursively for the construction context after elision, then we also fill it for the elided temporary object construction context as well.
> > > > > > > > >
> > > > > > > > > The only thing I can (maybe) do is to retrieve the construction context item. But then the switch is still duplicated for filling, because of the different control logic for different construction context kinds.
> > > > > > > > > The only thing I can (maybe) do is to retrieve the construction context item.
> > > > > > > >
> > > > > > > > This does not help even for retrieving the value, because its control logic also depends on the construction context kind. If I do it, it will be code triplication instead of duplication and results in a code that is worse to understand than the current one.
> > > > > > > >
> > > > > > > >
> > > > > > > > Please instead re-use the code that computes the object under construction. That'll save you ~50 lines of code and will be more future-proof (eg., standalone temporaries without destructor technically have a construction context with 0 items so when we implement them correctly your procedure will stop working).
> > > > > > >
> > > > > > > Any solution I can come up with rather adds 100 to 150 lines to the code instead of saving 50. For retrieving we only have to determine the construction context item (the key). For storing we also have to calculate the value. It sounds good to calculate these things in separate functions and then have a simple retriever and store function but there are lots of recursions, double strores, non-stores, retrieving in the store function that make it too complicated.
> > > > > > >
> > > > > > > Also retrieving is more complex than just determining the item and getting the value from the context: if you look at `SimpleReturnedValueKind` and `CXX17ElidedCopyReturnedValueKind` you can see that we do not use the construction context we got in the parameter (the construction context of the return value) but the construction context of the call instead. For `ElidedTemporaryObjectKind` we take the construction context before the elusion.
> > > > > > >
> > > > > > > Future proofness: I agree, this is a problem to solve. In cases where the construction context is empty maybe we can move the calculation of the values to a separate function that is called by both the "handler" and the "retriever".
> > > > > > Do I think correctly that `retrieveFromConstructionContext` (in the previous version) should return the same value as second part of `handleConstructionContext`? It does not look obvious at first. Or do we need a function with that purpose? If yes it looks a difficult task because the similar "logic" to get the value and update the state. Probably it is enough to add a flag to `handleConstructionContext` to not make new state. The current code looks bad because calling a "handle" type of function (that was private before) only to compute a value.
> > > > > >
> > > > > > Any solution I can come up with rather adds 100 to 150 lines to the code instead of saving 50.
> > > > >
> > > > > I think the following is a fairly literal implementation of my suggestion:
> > > > >
> > > > > {F12020285}
> > > > >
> > > > > It's 26 lines shorter than your `retrieveFromConstructionContext()` (ok, not 50, duplicating the switch was more annoying than i expected), it adds zero new logic (the only new piece of logic is to track constructors that were modeled correctly but their elision wasn't; previously we could fit it into control flow), it no longer tries to reverse-engineer the original behavior by duplicating its logic, and it's more future-proof for the reasons explained above.
> > > > Thank you for working on this. The point what I did not see (and I still no not see it) is that in your solution (and your proposal) instead of retrieving the value from the construction context we actually recalculate it, thus we do not use the construction context at all (except for `NewAllocatedObjectKind`) to retrieve the location of the return value. Can we guarantee that we always recalculate exactly the same symbolic value as the symbolic values we store in the map? For example, for `SimpleReturnedValueKind` et. al. we conjure a new symbolic value. Is it really the same value than that the one in the map, thus the one which is the real location where the object is constructed? If not, then the checkers might not work correctly because they store into the GDM using a different region as key than they use for retrieving the value.
> > > > I think the following is a fairly literal implementation of my suggestion: ... It's 26 lines shorter than your retrieveFromConstructionContext() (ok, not 50, duplicating the switch was more annoying than i expected), it adds zero new logic (the only new piece of logic is to track constructors that were modeled correctly but their elision wasn't; previously we could fit it into control flow), it no longer tries to reverse-engineer the original behavior by duplicating its logic, and it's more future-proof for the reasons explained above.
> > >
> > > +1 for this version. Even though Adam has valid concerns, this version contains functions with clear responsibilities.
> > >
> > > > Can we guarantee that we always recalculate exactly the same symbolic value as the symbolic values we store in the map? For example, for SimpleReturnedValueKind et. al. we conjure a new symbolic value. Is it really the same value than that the one in the map, thus the one which is the real location where the object is constructed?
> > >
> > > I think the guarantee is that in Artem's implementation the switch is duplicated. That would be okay for now, but I still consider this error-prone, exactly because of the duplication. So, maybe this yields for a common Visitor.
> > >
> > > Guys, would it be possible to have a common Visitor for these? Could be somewhat similar to `DeclVisitor` with return type as type parameter. And we could pass a lambda that is called in every `case`? Or would that be too overkill
> > >
> > > I think the guarantee is that in Artem's implementation the switch is duplicated.
> >
> > That is not my point. My point is that `handleConstructionContext()` is called, a value is calculated and stored in the map. This calculation may be the creation of a new conjured symbol. Thereafter, whenever a checker tries to retrieve the location of the return value from the construction context in Artem's solution we do not retrieve the value previously stored in the map, but calculate a new one, which is not necessarily the same as the one we stored in the map, because a in some cases a new symbol is conjured. Then the checker uses this new symbol as the index in GDM, but the object is constructed to the location stored in the map. Therefore, later when the checker tries to retrieve data from the GDM using the location of the existing object as a key it cannot find the data because it was stored using the wrong key.
> >
> > `getReturnValueUnderConstruction()` must always return the location where the object is to be constructed. Therefore I think it must not recalculate the location which may involve creation of new conjured symbols but first it must look into the map of objects under construction. This guarantees that the returned location is really the location where the object is constructed.
> Yes it should always return the same value, because there is exactly one correct solution to the problem that it's solving. If it may return different values for the same `(E, State, LCtx, CC)` tuple, it's a bug. This method should ultimately be turned into a static method.
>
> Loading a stored value if it exists will only sweep the problem under the rug. It'd be much better for us to //assert// that the freshly recomputed value is equal to the stored value if the latter is present.
>
> The code that constructs symbolic region for the return value of a top frame is indeed suspicious. I'm not sure it can actually return different regions twice but this code is clearly incorrect because `SymbolConjured` is never the correct solution anyway. The only reason `SymbolConjured` is used anywhere is because the author (in this case, i) was too lazy to define a new symbol class that has the right identity. `SymbolConjured` is like `UnknownVal` that's a bit less unknown but still screams "not implemented yet". I think this definitely deserves a fixme.
>
> > I think the guarantee is that in Artem's implementation the switch is duplicated. That would be okay for now, but I still consider this error-prone, exactly because of the duplication. So, maybe this yields for a common Visitor.
> > Guys, would it be possible to have a common Visitor for these?
>
> Mmm, i don't think i understand this argument. A visitor //is// a switch with a fancy interface and wildcards and lacking checks for unhandled cases.
>
> Generally though i wouldn't mind having a `ConstructionContextVisitor`. For instance, that'll allow us to replace code like
> ```lang=c++
> case ConstructionContext::CXX17ElidedCopyVariableKind:
> case ConstructionContext::SimpleVariableKind: {
> }
> ```
> with a single
> ```lang=c++
> VisitVariable(VariableConstructionContext *CC) {
> }
> ```
>
> I wouldn't like a visitor for `ConstructionContextItem`s because they are fairly meaningless when taken out of, well, //context//.
A `Visitor` is a good idea. However, is it used anywhere else? If it is, then it is worth to do it, but I think it can go into a new patch.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D80366/new/
https://reviews.llvm.org/D80366
More information about the cfe-commits
mailing list