[PATCH] D122231: [clang][dataflow] Add support for `value_or` in a comparison.

Tue Mar 29 01:16:30 PDT 2022

sgatev added inline comments.

================
Comment at: clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp:270
+  // needed.
+  BoolValue &ComparisonValue = MakeValue(Env, *HasValueVal);
+  auto *ComparisonExprLoc =
----------------
ymandel wrote:
> sgatev wrote:
> > ymandel wrote:
> > > ymandel wrote:
> > > > xazax.hun wrote:
> > > > > xazax.hun wrote:
> > > > > > ymandel wrote:
> > > > > > > xazax.hun wrote:
> > > > > > > > Is this the right way to initialize `ComparisonValue`?
> > > > > > > > 
> > > > > > > > Considering the expression: `opt.value_or(nullptr) != nullptr`
> > > > > > > > * When `has_value == false`, `opt.value_or(nullptr)` will return `nullptr`, so `!=` evaluates to false. This case seems to check out.
> > > > > > > > * However, when `has_value == true`, `opt` might still hold an `nullptr` and `!=` could still evaluate to false. 
> > > > > > > Thanks for digging into this. I think it's correct, but helpful to step through:
> > > > > > > 
> > > > > > > Its correctness depends on `MakeValue`, so I'll focus on that in particular. For the `nullptr` case, we'll get:
> > > > > > > ```
> > > > > > > HasValueVal && ContentsNotEqX
> > > > > > > ```
> > > > > > > So, when `has_value == true`, this basically reduces to `ContentsNotEqX`. Since that's an atom, the result is indeterminate, which I believe is the desired outcome.
> > > > > > > 
> > > > > > > WDYT?  Also, even if I've convinced you, please let me know how i can improve the comments. For that matter, would `MakeValue` be better with a more specific name, like "MakePredicate" or somesuch?
> > > > > > I think what confuses me is that we do something different for the 3 cases. You convinced me that `HasValueVal && ContentsNotEqX` is correct. But we only do this for one branch out of the 3.  What is the reason for that?
> > > > > Oh, never mind. Yeah, I think changing `MakeValue` to `MakePredicate` would make this a bit clearer. After a second read now I understand better what is going on.
> > > > Just to be clear: the three cases you mean are lines 273-283, or something else?
> > > and never mind my question, then (I rpelied before I saw your updated). I'll change the name and add comments.
> > Can you elaborate on the three cases on lines 273-283? Why not simply do
> > 
> > ```
> > auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr);
> > Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc);
> > Env.setValue(ComparisonExprLoc, ComparisonValue);
> > ```
> > Can you elaborate on the three cases on lines 273-283? Why not simply do
> > 
> > ```
> > auto &ComparisonExprLoc = Env.createStorageLocation(*ComparisonExpr);
> > Env.setStorageLocation(ComparisonExpr, ComparisonExprLoc);
> > Env.setValue(ComparisonExprLoc, ComparisonValue);
> > ```
> 
> for the second case: I think we should drop it -- I don't see a reason to maintain the previous value (if there is any). It might be a good idea for compositionality, but we're not doing that anywhere else, so it doesn't make sense here.
> 
> for the first and third case: I assumed that if the expression already has a location, we'd want to reuse it. But, based on your question, I take it that's incorrect?
> 
Dropping the second case makes sense to me.

For the rest, `createStorageLocation` returns a stable storage location so the snippet above should be sufficient. However, `setStorageLocation` will fail if we try calling it again with the same expression, even if it's called with the same storage location. What do you think about making `setStorageLocation` not fail if it's called with the same arguments?

================
Comment at: clang/lib/Analysis/FlowSensitive/Models/UncheckedOptionalAccessModel.cpp:279
+                 cast_or_null<BoolValue>(Env.getValue(*ComparisonExprLoc))) {
+    Env.setValue(*ComparisonExprLoc,
+                 Env.makeAnd(*CurrentValue, ComparisonValue));
----------------
ymandel wrote:
> ymandel wrote:
> > xazax.hun wrote:
> > > I am still wondering a bit about this case. 
> > > 
> > > We generate: `HasValueVal and ContentsNotEqX and CurrentValue`.'
> > > I wonder if we want: `HasValueVal  and (ContentsNotEqX  <=> CurrentValue)` instead?  Or even `HasValueVal  and CurrentValue`?
> > I don't think that the iff version is right, but `HasValueVal  and CurrentValue` could be. My concern is that we're not guaranteed that `CurrentValue` is populated. And, even if we were, it doesn't feel quite right. Assuming its a high fidelity model, we get (logically): `HasValue(opt) and Ne(ValueOr(opt,X),X)`. Then, when negated (say, on an else branch) we get `not(HasValue(opt)) or not(Ne(ValueOr(opt,X),X))` which is equivalent to `not(HasValue(opt)) or Eq(ValueOr(opt,X),X)`. While  true, it seems redundant, since the first clause should be derivable from the second (assuming an interpretatable semantics to the `ValueOr` predicate).
> > 
> > Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that.
> > Regardless, it might be better to step back and figure out how this should be done systematically. I'll try to come back with a proposal on that.
> 
> Here's what I have: in general, we're aiming for all models to be a sound (over) approximation of reality. That is what we're doing here as well. Yet, that poses a problem for the interpretation of the boolean not operator. If its operand is an overapproximation, then I believe the naive approach gives you an under approximation. That's the problem we're hitting when reasoning about the negation.
> 
> I'm not sure how to handle this. Stanislav -- have we dealt with this issue before?
> 
> That said, if we go back to the previous approach, of adding the information to the path condition, I think we avoid this problem, since the path conditions don't get negated.  To Gabor's earlier point:
> > There is an implication in the reverse direction as well. In case we know the optional is empty, we can prune one of the branches from the analysis. Is it possible to implement that with the current status of the framework?
> I think is covered by the condition we're adding. Namely:
> ```
> ExprValue => has_value
> ```
> where `ExprValue` is the truth value of the boolean expression.
> 
> So, the implication in the reverse direction is:
> ```
> !has_value => !ExprValue
> ```
> that is, if we know the optional doesn't hold a value, then we know that `opt.value_or(X) = X`
> 
> But, that implication is the contrapositive of our own, so I think it's already implicitly covered by adding the single implication. Does that sound right?
I'm not following where `Env.makeAnd(*CurrentValue, ComparisonValue)` comes from so I'd question whether it's sound or not. I would have expected to see something like `ExprValue => has_value` (which I believe was the case in the first iteration) and I see no issues with the contrapositive. If you have `x => y` and `not y` in the flow condition, you'll be able to infer that `not x` is true (assuming no other statements for `x`). How we use this to prune branches from the analysis is a question of its own.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D122231/new/

https://reviews.llvm.org/D122231