[PATCH] D116596: [clang][dataflow] Add transfer functions for assignment

Stanislav Gatev via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jan 10 07:27:18 PST 2022


sgatev added inline comments.


================
Comment at: clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:36
+/// FIXME: Consider replacing this with a model that is more aligned with C++
+/// value categories.
+enum class SkipPast {
----------------
xazax.hun wrote:
> ymandel wrote:
> > I'm not sure that value categories are right here.  The key is that skipping is optional, unlike value categories which are fixed.  I think the problem is just that C++ references are weird (for lack of a technical term). A reference variable exists simultaneously in two different categories -- a pointer and the value it points to. That's what this is trying to capture.
> > 
> > So, I'd recommend just changing the name to reflect that. WDYT of something like this?
> > 
> > ```
> > /// Indicates how to treat references, if encountered -- as a reference or the value referred to -- when retrieving
> > /// storage locations or values.
> > enum class RefTreatment {
> >    /// Consider the reference itself.
> >   AsSelf,
> >   /// Consider the referent.
> >   AsReferent,
> > };
> > ```
> Yeah, value categories are indeed fixed, I was thinking along the lines of treating expressions as if there was an `LValueToRValue` or `RValueToLValue` conversion. I think some of this problem might come from the ambiguity. E.g., when I want to query the analysis state about the value of `*p`, what do I want? Do I want the value for `*p` as an lvalue or `*p` as an rvalue? I would get a different answer in both cases. If I see `*p` in the source code it is fixed but when I want to query the analysis state it is not.
> 
> On the other hand I see that `SkipPast` is used in the transfer functions, and I am wondering if that is justified. According to the language rules, I can never observe the actual value of the reference. I can modify the pointee, (`ref = val`) or create a pointer to the same value (`&ref`), but I cannot actually read or modify what is inside the reference. So I wonder if it might be less error prone if there would be no way in the transfer functions to observe the values of the references either. 
> 
> But I do not have a strong feeling about any of this at this point so feel free to leave everything as it is. 
> 
> > So, I'd recommend just changing the name to reflect that.
> 
> I am not sure how clear `Referent` is. For a non-native speaker `Pointee` might be clearer, although I do see how it can be confusing. 
Thanks both! I agree with the comments. I prefer to keep it as is for now. I updated the FIXME with a pointer to this discussion so that we remember to get back to it at some point.


================
Comment at: clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:111
+  /// value in the environment.
+  Value *getValue(const StorageLocation &Loc, SkipPast SP) const;
+
----------------
ymandel wrote:
> Why have a SkipPast argument here? I understand the concept for considering var-decls - it corresponds to the weirdness of lvalue-refs themselves. But StorageLocation is a lower-level (and simpler) concept, to which this doesn't seem to map. I'm fine with resolving indirections, but what is the value of only doing so conditionally?
You're right. It's not necessary here. Removed it.


================
Comment at: clang/lib/Analysis/FlowSensitive/DataflowEnvironment.cpp:95
+StorageLocation &Environment::createStorageLocation(const Expr &E) {
+  // Evaluated expressions are always assigned the same storage locations to
+  // ensure that the environment stabilizes across loop iterations. Storage
----------------
xazax.hun wrote:
> sgatev wrote:
> > xazax.hun wrote:
> > > What is the model for expressions that evaluate to different locations in every loop iterations, e.g.:
> > > ```
> > > for(int *p = array; p != array + size; ++p)
> > >   *p; // The dereference expression
> > > ```
> > > 
> > > Here, having `*p` always evaluate to the same location is not correct, we'd probably need a way to widen this. 
> > In the current model, the storage location for `p` is stable and the value that is assigned to it changes. So, in one iteration the value assigned to the storage location for `p` is `val1` which is a `PointerValue` that points to `loc1` and in another iteration the value assigned to the storage location for `p` is `val2` which is a `PointerValue` that points to `loc2`. The storage location for `*p` is also stable and the value that is assigned to it is a `ReferenceValue` that points to either `loc1` or `loc2`. I implemented this and added a test.
> In this case, does it really make sense to materialize this mapping? If every subexpression corresponds to a unique location, and always the same location, we could use the subexpression directly as a location. 
True, but a `StorageLocation` can also be assigned to a `ValueDecl`, not only to `Expr`. We also keep some additional information in the `StorageLocation` sub-classes so I don't think we can replace it directly. 


================
Comment at: clang/lib/Analysis/FlowSensitive/Transfer.cpp:37
+      assert(S->getLHS() != nullptr);
+      const Expr *LHS = S->getLHS()->IgnoreParens();
+      assert(LHS != nullptr);
----------------
ymandel wrote:
> maybe comment as to why this is necessary? I'd have thought that CFG's flattening of expressions would handle this (why, in general, we don't need to look deep into expressions).
Added a comment. The CFG does not contain `ParenExpr` as top-level statements in basic blocks, however sub-expressions can still be of that type.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D116596/new/

https://reviews.llvm.org/D116596



More information about the cfe-commits mailing list