[cfe-dev] How to track the 'this' pointer when using the clang static analyzer?

Artem Dergachev via cfe-dev cfe-dev at lists.llvm.org
Mon Jun 5 02:20:56 PDT 2017


In the analyzer, CXXThisRegion is representing the cell on the stack 
that contains the value of the implicit "this" pointer argument during 
method call. Similarly to how VarRegion for the ParmVarDecl would be the 
place where an explicit argument is pushed onto the stack during 
function call.

The actual value stored in this region, however - the value of the 
implicit argument which is a pointer value that points to "this" object 
on the heap or wherever it resides - may be quite arbitrary. In general 
case you cannot say, by looking at that value, that it was taken from 
CXXThisRegion.

In your case, the method you're looking at is being analyzed "at top 
frame", which means that the analysis has "started from that method", as 
opposed to "started elsewhere in some 'foo()' but ended up within this 
method because this method is being called from that 'foo()'". Because 
your method is being analyzed at top frame, value of CXXThisRegion has 
not changed since the beginning of the analysis - in fact, the language 
doesn't provide any safe way to overwrite this stack region, you are not 
allowed to compute &this or assign to this in C++. It means that the 
value of CXXThisRegion is denoted by the special kind of symbol that we 
use to represent values of regions that have been in these regions since 
the beginning of the analysis - it's SymbolRegionValue "reg_$0<this>", 
which contains the pointer to the current CXXThisRegion. The actual this 
object is therefore known to reside at (symbolic) address 
"reg_$0<this>". The "this" object is being pointed to by this pointer, 
and begins at this address. That's pretty much the only thing we know 
about this region - we're not even sure if the type of the object is 
"struct X" or any derived structure. Hence the region is represented as 
SymbolicRegion around reg_$0<this>. I've recently explained more about 
symbolic regions: 
http://lists.llvm.org/pipermail/cfe-dev/2017-June/054084.html

However, if the analysis begins at that outside-ish function...

   void foo() {
     X y(); // case 1: calls your constructor
     X *z = new X(); // case 2: calls your constructor again
   }

... then things get different. We have two different CXXThisRegions 
here. The only difference between them is that they have different 
parent regions, namely StackArgumentsSpaceRegions that correspond to 
different stack frames. One stack frame is for the call of the 
constructor in case 1, another stack frame is for the call of the 
constructor in case 2. They don't exist simultaneously. The top stack 
frame doesn't have its own CXXThisRegion because foo() is not a method.

Now, in case 1 the first CXXThisRegion contains a pointer to variable y. 
The relevant SVal is dumped as "&y". It's a stack variable within 
StackLocalsSpaceRegion of the top frame. We know a lot about this 
region, we even know the exact type of the object. That is the region 
you're looking for.

In case 2 the second CXXThisRegion contains a pointer to the region 
constructed by operator new(). It may look as 
&element{SymRegion{conj_$0<X *>}, X, 0 S32b}, which means that the 
unknown return value of operator new() was denoted by a SymbolConjured 
"conj_$0<X *>", and the symbolic segment of heap memory within 
HeapSpaceRegion that begins at pointer conj_$0<X *> indeed contains an 
object of type X, and this is the region you're looking for.

--

I said all this in order to demonstrate that your approach of looking 
for CXXThisRegion in the structure of the value is indeed not going to 
work; you can obtain an arbitrary Loc value that doesn't necessarily 
contain any mentions of CXXThisRegion.

Now, on what to do: i'd point to https://reviews.llvm.org/D26762 that 
contains a simple method of obtaining "this" in both top-frame and 
non-top-frame case. I cc'd Krzysztof, who is the author of this patch. 
This patch is still on review because Krzysztof has an intention to 
refactor the code, however now that you might need it as well, i guess 
i'd rather land it. The patch adds a method into ProgramState that gives 
you the "this" object for a given stack frame (represented by a 
StackFrameContext). It works as follows.

In case of top frame, it takes the value stored in the CXXThisRegion. As 
i explained above, because CXXThisRegion's value cannot be mutated, the 
result would always be &SymRegion{reg_$0<this>} (the number after reg_ 
may change, the type of the pointer may change as well).

In case of non-top frame, we cannot do that because the value we're 
looking for, that was explicitly stored in CXXThisRegion at call site, 
might have been garbage-collected by now (because there may be no 
further references to it in the remaining part of the method's body). 
Instead, it finds the expression that corresponds to the method call, 
finds the implicit object argument sub-expression (it's always there 
even if not written explicitly, because all hail clang AST), and takes 
the value of that expression in the parent stack frame (worth noting 
that the same expression may have different values in different stack 
frames, eg. your method may be recursively called with different this 
arguments).

--

As far as i understand, you're working on the VirtualCallChecker. I'm 
probably curious about why it was necessary to explicitly monitor all 
cases when 'this' is binded to various variables, for this project, 
because i'm not seeing why is it obviously necessary. Like, it might be 
enough to see, on specific events, if a certain sub-expression is 
immediately equal to "this", and it may be easier than tracking all the 
places from which it may come by re-implementing alias analysis, if 
that's what you're trying to do (though you'd still have to know the 
value of "this" in any case).

I suspect that at the beginning of your project, you might only be 
interested in the top-frame case, because it's easier for various other 
reasons. So that explanation might not be immediately necessary to 
understand.

Also i'd probably take another chance to point to my workbook at 
https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf 
, which might have cleared up this answer as well.

Hope this helps.


04/06/2017 4:00 PM, Xin Wang via cfe-dev wrote:
> Hello everyone!
>
> I want to check that 'this' is binded to x, the code is blow:
>
> struct X {
>   X() {
>     X* x = this;
>   }
> };
>
> I dumped the exploded graph, the live expression of x is blow:
>     (0x66ab5b0,0x667a9c0) x : &SymRegion{reg_$0<struct X * this>}
>
> But I'm not sure how to recognize programmatically that this is a 
> symbolic region for a /|this|/ pointer. I thought that maybe I could 
> |use isa<CXXThisRegion>()| method, but it turns out that the 
> |SymbolicRegion| and |CXXThisRegion| don't share the same
> inheritance chain.
>
> Look forward to your help!
> Xin
>
>
> _______________________________________________ cfe-dev mailing list 
> cfe-dev at lists.llvm.org 
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev




More information about the cfe-dev mailing list