[cfe-dev] [Static Analyzer] temp_object chaos?

Gábor Kozár kozargabor at gmail.com
Tue Sep 17 13:39:47 PDT 2013


Hi Anna,



 Work on extending the analyzer core infrastructure to track value
objects. We currently do not have time to work on this, but we could
schedule an internal meeting to figure out a design for this if you are
interested in pursuing this direction.



How difficult a job would this be? I'm absolutely open to this idea,
but it's doubtful that my boss would authorize this (we're low on
manpower), and I simply don't have the energy to work on this outside
my working hours (I'm also an undergraduate university student with a
full timetable).



Some hacky solution based on pattern matching the expression in the
checker. For example, you'll need to identify the region created by
v.begin() and deal with the fact that it is a temporary by pattern
matching on the MaterializeTemporaryExpr. As with any pattern matching
solutions, this will be brittle and might not work for all code.
Something like this could work (I did not think it through very well):



This is similar to what I was considering attempting, but it just
doesn't feel robust enough - i.e. there probably would be many
corner-cases when this simply would not work - to be worth the effort.



Thanks!



--
Gábor Kozár -- ShdNx
kozargabor at gmail.com





On Tue, Sep 17, 2013, at 18:25, Anna Zaks wrote:



On Sep 14, 2013, at 7:44 AM, Gábor Kozár <[1]kozargabor at gmail.com>
wrote:

Hi Anna,

The last line is probably wrong.. You have "checkPostStmt on v.begin()"
twice.

This is strange. Would be good to figure out what's going on.


I get it every time I re-run the checker, so I'm positive this is what
happens. It did look weird for me too, that's why I mentioned it. Note
though that I'm using Clang 3.3, so it might be something that's been
fixed since then.

This is what I get for the bind:

Bind: val lazyCompoundVal{0x4cca6b0,temp_object{iterator,0x4bd1df0}} =>
loc &it

 - in stmt: v.begin()

 - stmt ast: CXXConstructExpr 0x4bd5d18
'std::vector<int>::iterator':'class __gnu_cxx::__normal_iterator<int *,
class std::vector<int, class std::allocator<int> > >' 'void (class
__gnu_cxx::__normal_iterator<int *, class std::vector<int, class
std::allocator<int> > > &&) noexcept' elidable

`-MaterializeTemporaryExpr 0x4bd5bb8 'class
__gnu_cxx::__normal_iterator<int *, class std::vector<int, class
std::allocator<int> > >' xvalue

  `-CXXMemberCallExpr 0x4bd1df0 'iterator':'class
__gnu_cxx::__normal_iterator<int *, class std::vector<int, class
std::allocator<int> > >'

    `-MemberExpr 0x4bd1dc0 '<bound member function type>' .begin
0x4bc7600

      `-DeclRefExpr 0x4bd1d28 'std::vector<int>':'class
std::vector<int, class std::allocator<int> >' lvalue Var 0x4b96ee0 'v'
'std::vector<int>':'class std::vector<int, class std::allocator<int> >'

We expect the MaterializeTemporaryExpr to create a new region - it is
"materializing" a temporary value into memory by creating a region to
host it.



The issue here is that the iterators are value objects, represented by
LazyCompoundVals and SVals, and we do not have any way of
persisting/tracking values right now.


The only case I'm concerning myself with right now is when iterators
are saved into local variables, like so:

std::vector<int> v = { 1, 2, 3 };
auto it = v.begin();
v.push_back(10);
it++; // possibly invalid!


The variable 'it' will be represented by a memory region, and that it
what I'm tracking, instead of the actual iterator value. I realize that
this is limiting, but my understanding of the Clang Static Analyzer is
rather limited and this is the best I could come up with. :)


Again, none of the checkers are trying to track value objects. They
track regions (pointers). This checker is different.

There are two approaches one could take:

  1)  Work on extending the analyzer core infrastructure to track value
objects. We currently do not have time to work on this, but we could
schedule an internal meeting to figure out a design for this if you are
interested in pursuing this direction.

  2) Some hacky solution based on pattern matching the expression in
the checker. For example, you'll need to identify the region created by
v.begin() and deal with the fact that it is a temporary by pattern
matching on the MaterializeTemporaryExpr. As with any pattern matching
solutions, this will be brittle and might not work for all code.
Something like this could work (I did not think it through very well):
    - You register a callback to intercept PostCall on v.begin(). Then
check if the parent expr is a MaterializeTemporaryExpr. If yes, you
store (callExpr, region) pair into a map. (You probably have another
map that maps this region to vector state.)
    - You register another callback for MaterializeTemporaryExpr. Here,
you check if the child expression is one of the callExprs you have in
the map, and if yes, you remove the pair from the map and  associate
the new region with the vector info.
The trickiest part here is ensuring that you clear out the state after
you don't use the info anymore. For example, if you store info for
every v.begin regardless weather it is wrapped in
MaterializeTemporaryExpr or not, you might never clean out that info
from the map.



Jordan, I think this is similar to what we've discussed, but I might be
missing/added some details..

Anna.


Basically the only problem seems to be that I don't get consistent
values representing the iterator: the result SVal from checkPostCall is
different than the value that is actually bound to the variable. I
guess I can try to work-around by implementing a kind of a state
machine, and instead of starting to track iterator values in
checkPostCall, I just record that a given Expr yields an iterator
value, and look for that Expr in checkBind, but that sounds hackish and
I don't see any reason why what I'm trying now shouldn't work.

Thanks!

--
Gábor Kozár -- ShdNx
[2]kozargabor at gmail.com



On Thu, Sep 12, 2013, at 19:48, Anna Zaks wrote:



On Sep 11, 2013, at 2:03 PM, Gábor Kozár <[3]kozargabor at gmail.com>
wrote:

I'm analyzing the following source line:

auto it = v.begin(); // v is an std::vector<int>

As far as I'm able to reconstruct what happens using the checker
callbacks, it goes somehow like this:
 - checkPostCall on v.begin(): the call has been interpreted, the
result is a lazyCompoundVal with a temp_object within it
 - checkPostStmt on v.begin(): ProgramState::getSVal() yields the same
result as above (the same lazyCompoundVal with the same temp_object
within it)
 - bind: some lazyCompoundVal with a temp_object within it to loc '&it'
-- but this lazyCompoundVal and temp_object are different than above!!!
 - checkPostStmt on v.begin() again: ProgramState::getSVal() yields
&it, getting the SVal inside the region gives a lazyCompoundVal with
'it' in it


The last line is probably wrong.. You have "checkPostStmt on v.begin()"
twice. Also, can you figure out which AST node is responsible for the
bind?

Basically, to understand what is going on better, it would be valuable
to match the AST of the statements to the callbacks (you can get the
AST with clang -cc1 -ast-dump):
    `-DeclStmt 0x1023e7f00 <line:6:3, col:22>
      `-VarDecl 0x1023cd780 <col:3, col:21> it 'class
__gnu_cxx::__normal_iterator<int *, class std::vector<int, class
std::allocator<int> > >':'class __gnu_cxx::__normal_iterator<int *,
class std::vector<int, class std::allocator<int> > >'
        `-CXXConstructExpr 0x1023e7ec8 <col:13, col:21> 'class
__gnu_cxx::__normal_iterator<int *, class std::vector<int, class
std::allocator<int> > >':'class __gnu_cxx::__normal_iterator<int *,
class std::vector<int, class std::allocator<int> > >' 'void (const
class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class
std::allocator<int> > > &) throw()' elidable
          `-MaterializeTemporaryExpr 0x1023e7da8 <col:13, col:21>
'const class __gnu_cxx::__normal_iterator<int *, class std::vector<int,
class std::allocator<int> > >' lvalue
            `-ImplicitCastExpr 0x1023e7d90 <col:13, col:21> 'const
class __gnu_cxx::__normal_iterator<int *, class std::vector<int, class
std::allocator<int> > >' <NoOp>
              `-CXXMemberCallExpr 0x1023cd8a0 <col:13, col:21>
'iterator':'class __gnu_cxx::__normal_iterator<int *, class
std::vector<int, class std::allocator<int> > >'
                `-MemberExpr 0x1023cd870 <col:13, col:15> '<bound
member function type>' .begin 0x1023c66d0
                  `-DeclRefExpr 0x1023cd7d8 <col:13>
'std::vector<int>':'class std::vector<int, class std::allocator<int> >'
lvalue Var 0x1023bebc0 'v' 'std::vector<int>':'class std::vector<int,
class std::allocator<int> >'


We've looked at implementing these checks before and it is not a simple
problem. The issue here is that the iterators are value objects,
represented by LazyCompoundVals and SVals, and we do not have any way
of persisting/tracking values right now.



Let's start with mapping the results of callbacks to the AST nodes to
see if the modeling that we have now makes sense.



Thanks,

Anna.




My problem is that I'm unable to implement a checker that would need to
track iterator values, as temp_objects just seem to pop and disappear
without leaving a trace and without having any followable connection
between them.

Is this working as intended, and if so, how should I approach this? I
recall there being an open project for better modelling of C++
temporary objects, is that whose effect I'm seeing here?

Thanks!

--
Gábor Kozár -- ShdNx
[4]kozargabor at gmail.com

_______________________________________________

cfe-dev mailing list

[5]cfe-dev at cs.uiuc.edu

[6]http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

References

1. mailto:kozargabor at gmail.com
2. mailto:kozargabor at gmail.com
3. mailto:kozargabor at gmail.com
4. mailto:kozargabor at gmail.com
5. mailto:cfe-dev at cs.uiuc.edu
6. http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130917/341f501f/attachment.html>


More information about the cfe-dev mailing list