[PATCH] D124932: [clang][dataflow] Track `optional` contents in `optional` model.

Thu May 5 06:18:47 PDT 2022

ymandel added a comment.

In D124932#3492495 <https://reviews.llvm.org/D124932#3492495>, @xazax.hun wrote:

> Overall looks good to me. I am curious what will the strategy be to properly support construction. Do you plan to introduce a customization point to `Env.createValue` to give checks/models a way to set properties up? Or do you have something else in mind?

If we want to go the eager route, than I think that's the "right" solution.  Anything else is brittle, since it is easy to miss a construction point.  However, I'm more inclined to move the whole framework to a lazy initialization process, if we can devise one.

================
Comment at: clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp:219
+/// its location. Returns nullptr if the value can't be represented.
+StorageLocation *initializeUnwrappedValue(const Expr &Expr,
+                                          StructValue &OptionalVal,
----------------
xazax.hun wrote:
> I recall some problems with convergence due to creating fresh locations lazily in multiple branches. I wonder if doubling down before finding a proper solution could make it harder to fix this in the near future. If this is just a temporary solution until construction is better supported, it might be fine.
Excellent question. Yes, you're right about the convergence issue this falls into the same category.  That said, we intentionally reuse the same location in all branches, which mitigates that issue in this case -- we are at least assured that the two branches will have the contents value at the same location. In fact, the FIXME I added is wrong (and I've adjusted it accordingly). The property will be shared by all blocks that share the same value representing the `Optional` (which for the most part is done eagerly).

To your larger point, about the advisability of this approach, even for the short term:
1. I'm pushing on this largely because I have the data to support that it works in practice. We've done multiple surveys of significant size (~1000 files, 10k optional accesses) and found it does very well. So, this is not much of an issue in practice, in large part because the contents of optionals don't tend to further influence the outcome of the check (e.g. in the case of nested optionals).
2. I thought of redoing this in an eager fashion, but realized that ultimately we want to move everything to a lazy model, so it seemed less than ideal to start trying to be eager here.

================
Comment at: clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp:286
   // Record that this unwrap is *not* provably safe.
   State.Lattice.getSourceLocations().insert(ObjectExpr->getBeginLoc());
 }
----------------
xazax.hun wrote:
> I wonder how well this strategy works in practice. If we have many `unwrap` calls to the same `optional`, we might end up emitting lots of diagnostics. An alternative approach is to assume that the `optional` has a value after the first unwrap. This would ensure that we only report a particular problem once and can reduce the noise.
I see your point, and agree that there does seem room for significantly reducing warnings, even if all are correct. But, in practice, we haven't gotten complaints (which is a little surprising, actually). I wonder if it would be good to pair the report with the surrounding compound statement, and limit to one per compound statement. I fear that shutting it off entirely after the first issue might be _too_ quiet. Either way, this is definitely something that we'll be actively tweaking.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124932/new/

https://reviews.llvm.org/D124932