[cfe-dev] How to track the 'this' pointer when using the clang static analyzer?
Artem Dergachev via cfe-dev
cfe-dev at lists.llvm.org
Mon Jun 5 02:20:56 PDT 2017
In the analyzer, CXXThisRegion is representing the cell on the stack
that contains the value of the implicit "this" pointer argument during
method call. Similarly to how VarRegion for the ParmVarDecl would be the
place where an explicit argument is pushed onto the stack during
function call.
The actual value stored in this region, however - the value of the
implicit argument which is a pointer value that points to "this" object
on the heap or wherever it resides - may be quite arbitrary. In general
case you cannot say, by looking at that value, that it was taken from
CXXThisRegion.
In your case, the method you're looking at is being analyzed "at top
frame", which means that the analysis has "started from that method", as
opposed to "started elsewhere in some 'foo()' but ended up within this
method because this method is being called from that 'foo()'". Because
your method is being analyzed at top frame, value of CXXThisRegion has
not changed since the beginning of the analysis - in fact, the language
doesn't provide any safe way to overwrite this stack region, you are not
allowed to compute &this or assign to this in C++. It means that the
value of CXXThisRegion is denoted by the special kind of symbol that we
use to represent values of regions that have been in these regions since
the beginning of the analysis - it's SymbolRegionValue "reg_$0<this>",
which contains the pointer to the current CXXThisRegion. The actual this
object is therefore known to reside at (symbolic) address
"reg_$0<this>". The "this" object is being pointed to by this pointer,
and begins at this address. That's pretty much the only thing we know
about this region - we're not even sure if the type of the object is
"struct X" or any derived structure. Hence the region is represented as
SymbolicRegion around reg_$0<this>. I've recently explained more about
symbolic regions:
http://lists.llvm.org/pipermail/cfe-dev/2017-June/054084.html
However, if the analysis begins at that outside-ish function...
void foo() {
X y(); // case 1: calls your constructor
X *z = new X(); // case 2: calls your constructor again
}
... then things get different. We have two different CXXThisRegions
here. The only difference between them is that they have different
parent regions, namely StackArgumentsSpaceRegions that correspond to
different stack frames. One stack frame is for the call of the
constructor in case 1, another stack frame is for the call of the
constructor in case 2. They don't exist simultaneously. The top stack
frame doesn't have its own CXXThisRegion because foo() is not a method.
Now, in case 1 the first CXXThisRegion contains a pointer to variable y.
The relevant SVal is dumped as "&y". It's a stack variable within
StackLocalsSpaceRegion of the top frame. We know a lot about this
region, we even know the exact type of the object. That is the region
you're looking for.
In case 2 the second CXXThisRegion contains a pointer to the region
constructed by operator new(). It may look as
&element{SymRegion{conj_$0<X *>}, X, 0 S32b}, which means that the
unknown return value of operator new() was denoted by a SymbolConjured
"conj_$0<X *>", and the symbolic segment of heap memory within
HeapSpaceRegion that begins at pointer conj_$0<X *> indeed contains an
object of type X, and this is the region you're looking for.
--
I said all this in order to demonstrate that your approach of looking
for CXXThisRegion in the structure of the value is indeed not going to
work; you can obtain an arbitrary Loc value that doesn't necessarily
contain any mentions of CXXThisRegion.
Now, on what to do: i'd point to https://reviews.llvm.org/D26762 that
contains a simple method of obtaining "this" in both top-frame and
non-top-frame case. I cc'd Krzysztof, who is the author of this patch.
This patch is still on review because Krzysztof has an intention to
refactor the code, however now that you might need it as well, i guess
i'd rather land it. The patch adds a method into ProgramState that gives
you the "this" object for a given stack frame (represented by a
StackFrameContext). It works as follows.
In case of top frame, it takes the value stored in the CXXThisRegion. As
i explained above, because CXXThisRegion's value cannot be mutated, the
result would always be &SymRegion{reg_$0<this>} (the number after reg_
may change, the type of the pointer may change as well).
In case of non-top frame, we cannot do that because the value we're
looking for, that was explicitly stored in CXXThisRegion at call site,
might have been garbage-collected by now (because there may be no
further references to it in the remaining part of the method's body).
Instead, it finds the expression that corresponds to the method call,
finds the implicit object argument sub-expression (it's always there
even if not written explicitly, because all hail clang AST), and takes
the value of that expression in the parent stack frame (worth noting
that the same expression may have different values in different stack
frames, eg. your method may be recursively called with different this
arguments).
--
As far as i understand, you're working on the VirtualCallChecker. I'm
probably curious about why it was necessary to explicitly monitor all
cases when 'this' is binded to various variables, for this project,
because i'm not seeing why is it obviously necessary. Like, it might be
enough to see, on specific events, if a certain sub-expression is
immediately equal to "this", and it may be easier than tracking all the
places from which it may come by re-implementing alias analysis, if
that's what you're trying to do (though you'd still have to know the
value of "this" in any case).
I suspect that at the beginning of your project, you might only be
interested in the top-frame case, because it's easier for various other
reasons. So that explanation might not be immediately necessary to
understand.
Also i'd probably take another chance to point to my workbook at
https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf
, which might have cleared up this answer as well.
Hope this helps.
04/06/2017 4:00 PM, Xin Wang via cfe-dev wrote:
> Hello everyone!
>
> I want to check that 'this' is binded to x, the code is blow:
>
> struct X {
> X() {
> X* x = this;
> }
> };
>
> I dumped the exploded graph, the live expression of x is blow:
> (0x66ab5b0,0x667a9c0) x : &SymRegion{reg_$0<struct X * this>}
>
> But I'm not sure how to recognize programmatically that this is a
> symbolic region for a /|this|/ pointer. I thought that maybe I could
> |use isa<CXXThisRegion>()| method, but it turns out that the
> |SymbolicRegion| and |CXXThisRegion| don't share the same
> inheritance chain.
>
> Look forward to your help!
> Xin
>
>
> _______________________________________________ cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
More information about the cfe-dev
mailing list