[PATCH] D37023: [analyzer] Fix bugreporter::getDerefExpr() again.

Devin Coughlin via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Aug 29 17:49:49 PDT 2017

dcoughlin added a comment.

This seems like it is on the right track, but I think the emphasis here on lvalue-to-rvalue casts is a bit misplaced. The logic for "what is the pointer being dereferenced" and "what is the lvalue that holds the pointer being dereferenced" is mixed together in a way that I find confusing.

I think an easier to understand approach is to first find the rvalue of the pointer being dereferenced. Then, second, pattern match on that to find the underlying lvalue the pointer was loaded from (when possible).

But first a couple of points just to make sure we're on the same page (with apologies for the wall of text!). Suppose we have:

  struct Y {
     int z;
  struct X {
    Y y;

and a local 'x' of type `X *`.

For an access of the form `int l = x->y.z` the pointer being dereferenced is the rvalue `x`. The dereference ultimately results in a load from memory, but the address loaded from may be different from the the pointer being dereferenced. Here, for example, the address loaded from is (the rvalue of) `x` + "offset of field y into X" + "offset of field z into Y".

Fundamentally, given an expression representing the lvalue that will be loaded from, the way to find the rvalue of the pointer being dereferenced is to strip off the parts of the expression representing addition of offsets and any identity-preserving casts until you get to the expression that represents the rvalue of the base address. (This is why stripping off unary `*` makes sense -- in an lvalue context it represents adding an offset of 0. This is also why stripping off lvalue-to-rvalue casts doesn't make sense -- they do not preserve identity nor add an offset)

For `int l = x->y.z,` the expression for the pointer being dereferenced is the lvalue-to-rvalue cast that loads the value stored in local variable `x` from the lvalue that represents the address of the local variable `x`. But note that the pointer being dereferenced won't always be an lvalue-to-rvalue cast. For example in `int l = returnsPointerToAnX()->y.z` the expression for the pointer being dereferenced is the call expression `returnsPointerToAnX()`. There is no lvalue in sight.

Now, the existing behavior of `getDerefExpr()` requires that it return the lvalue representing the location containing the dereferenced pointer when possible. This is required by `trackNullOrUndefValue()` (sigh). So I suggest, for now, first finding the rvalue for the dereferenced value and then adding a fixup at the end of `getDerefExpr()` that looks to see whether the rvalue for the dereferenced pointer is an lvalue-to-rvalue cast. If so, the fixup will return the sub expression representing the lvalue.

Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:46
+/// Given that expression S represents a pointer that would be dereferenced,
+/// try to find the immediate sub-expression that represents the pointer
+/// which is being dereferenced.
An interesting aspect of this function is that sometimes it returns a sub-expression that represents the pointer rvalue and sometimes it returns a sub-expression that represents the lvalue whose location contains the pointer which is being dereferenced.

I believe the reason we need the lvalue is that trackNullOrUndef() tracks lvalues better than rvalues. (That function loads from the lvalue to get the symbol representing the dereferenced pointer value.)

This behavior is pretty confusing, so we should document it in the comment.

Comment at: lib/StaticAnalyzer/Core/BugReporterVisitors.cpp:48
+/// which is being dereferenced.
+/// For example, for 'x->y.z = 2' the answer would be 'x->y' (without the
+/// implicit lvalue-to-rvalue cast surrounding it); then, for 'x->y' (again,
This comment is not right. For `x->y.z = 2` the answer should be `x` and not `x->y`.


More information about the cfe-commits mailing list