[PATCH] D122231: [clang][dataflow] Add support for `value_or` in a comparison.

Mon Mar 28 05:40:27 PDT 2022

ymandel marked 2 inline comments as done.
ymandel added inline comments.

================
Comment at: clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp:270
+  // needed.
+  BoolValue &ComparisonValue = MakeValue(Env, *HasValueVal);
+  auto *ComparisonExprLoc =
----------------
sgatev wrote:
> ymandel wrote:
> > ymandel wrote:
> > > xazax.hun wrote:
> > > > xazax.hun wrote:
> > > > > ymandel wrote:
> > > > > > xazax.hun wrote:
> > > > > > > Is this the right way to initialize `ComparisonValue`?
> > > > > > > 
> > > > > > > Considering the expression: `opt.value_or(nullptr) != nullptr`
> > > > > > > * When `has_value == false`, `opt.value_or(nullptr)` will return `nullptr`, so `!=` evaluates to false. This case seems to check out.
> > > > > > > * However, when `has_value == true`, `opt` might still hold an `nullptr` and `!=` could still evaluate to false. 
> > > > > > Thanks for digging into this. I think it's correct, but helpful to step through:
> > > > > > 
> > > > > > Its correctness depends on `MakeValue`, so I'll focus on that in particular. For the `nullptr` case, we'll get:
> > > > > > ```
> > > > > > HasValueVal && ContentsNotEqX
> > > > > > ```
> > > > > > So, when `has_value == true`, this basically reduces to `ContentsNotEqX`. Since that's an atom, the result is indeterminate, which I believe is the desired outcome.
> > > > > > 
> > > > > > WDYT?  Also, even if I've convinced you, please let me know how i can improve the comments. For that matter, would `MakeValue` be better with a more specific name, like "MakePredicate" or somesuch?
> > > > > I think what confuses me is that we do something different for the 3 cases. You convinced me that `HasValueVal && ContentsNotEqX` is correct. But we only do this for one branch out of the 3.  What is the reason for that?
> > > > Oh, never mind. Yeah, I think changing `MakeValue` to `MakePredicate` would make this a bit clearer. After a second read now I understand better what is going on.
> > > Just to be clear: the three cases you mean are lines 273-283, or something else?
> > and never mind my question, then (I rpelied before I saw your updated). I'll change the name and add comments.
> Can you elaborate on the three cases on lines 273-283? Why not simply do
> 
> ```
> auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr);
> Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc);
> Env.setValue(ComparisonExprLoc, ComparisonValue);
> ```
> Can you elaborate on the three cases on lines 273-283? Why not simply do
> 
> ```
> auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr);
> Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc);
> Env.setValue(ComparisonExprLoc, ComparisonValue);
> ```

for the second case: I think we should drop it -- I don't see a reason to maintain the previous value (if there is any). It might be a good idea for compositionality, but we're not doing that anywhere else, so it doesn't make sense here.

for the first and third case: I assumed that if the expression already has a location, we'd want to reuse it. But, based on your question, I take it that's incorrect?

================
Comment at: clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp:279
+                 cast_or_null<BoolValue>(Env.getValue(*ComparisonExprLoc))) {
+    Env.setValue(*ComparisonExprLoc,
+                 Env.makeAnd(*CurrentValue, ComparisonValue));
----------------
ymandel wrote:
> xazax.hun wrote:
> > I am still wondering a bit about this case. 
> > 
> > We generate: `HasValueVal and ContentsNotEqX and CurrentValue`.'
> > I wonder if we want: `HasValueVal  and (ContentsNotEqX  <=> CurrentValue)` instead?  Or even `HasValueVal  and CurrentValue`?
> I don't think that the iff version is right, but `HasValueVal  and CurrentValue` could be. My concern is that we're not guaranteed that `CurrentValue` is populated. And, even if we were, it doesn't feel quite right. Assuming its a high fidelity model, we get (logically): `HasValue(opt) and Ne(ValueOr(opt,X),X)`. Then, when negated (say, on an else branch) we get `not(HasValue(opt)) or not(Ne(ValueOr(opt,X),X))` which is equivalent to `not(HasValue(opt)) or Eq(ValueOr(opt,X),X)`. While  true, it seems redundant, since the first clause should be derivable from the second (assuming an interpretatable semantics to the `ValueOr` predicate).
> 
> Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that.
> Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that.

Here's what I have: in general, we're aiming for all models to be a sound (over) approximation of reality. That is what we're doing here as well. Yet, that poses a problem for the interpretation of the boolean not operator. If its operand is an overapproximation, then I believe the naive approach gives you an under approximation. That's the problem we're hitting when reasoning about the negation.

I'm not sure how to handle this. Stanislav -- have we dealt with this issue before?

That said, if we go back to the previous approach, of adding the information to the path condition, I think we avoid this problem, since the path conditions don't get negated.  To Gabor's earlier point:
> There is an implication in the reverse direction as well. In case we know the optional is empty, we can prune one of the branches from the analysis. Is it possible to implement that with the current status of the framework?
I think is covered by the condition we're adding. Namely:
```
ExprValue => has_value
```
where `ExprValue` is the truth value of the boolean expression.

So, the implication in the reverse direction is:
```
!has_value => !ExprValue
```
that is, if we know the optional doesn't hold a value, then we know that `opt.value_or(X) = X`

But, that implication is the contrapositive of our own, so I think it's already implicitly covered by adding the single implication. Does that sound right?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122231/new/

https://reviews.llvm.org/D122231