[cfe-dev] StaticAnalyzer: Implementing checks for std::string

Ted Kremenek kremenek at apple.com
Mon Aug 12 22:44:37 PDT 2013


Hi Jared,

I had some offline design discussions with Jordan and Anna, as this proposal raises some interesting design points.  I always saw synthesized function bodies as being a very flexible tool we could use for extending the analyzer’s functionality, including writing new checkers, but we hadn’t really thought that much about the design space.  I think what you propose seems like a very promising direction.

One thing that is really nice about using the synthesized bodies is that they are easier to get the path-sensitive reasoning in the checker correct. By creating fake bodies, the analyzer engine is responsible for reasoning about the branches, etc., and thus simulating the full path-sensitivity that way.  Complicated checkers, on the other hand, need to deal with a bunch of assumption logic, and consider fully-constrained versus under constrained conditions for bugs.  That’s likely not to go fully away with checkers written using synthesized bodies, but it may get much simpler.  Also it would be interesting to see if we could write full checkers using synthesized bodies, e.g. report the actual bug by calling a hook, but that’s something we can explore over time.

While we are excited about using fake bodies for this approach, we made a few important observations in our offline design discussion.

Most importantly, the BugReporter logic (the code which generates error reports) isn’t that hardened to deal with methods/functions that have invalid source locations, which is the case for all synthesized bodies.  That’s somewhat expected, and essentially exposes further work to be done in BugReporter.  Essentially, when reporting a bug, BugReporter needs to the diagnostics accordingly when we don’t have a proper source location.

For example, suppose we had a fake implementation of the function foo(), and we knew that foo() dereferenced its pointer argument, for example:

  int *p = 0;
  foo(p);

In this case, we would want to report a null dereference warning.  The problem is that the warning would trigger inside the body of foo(), which is synthesized, and has no valid source locations for the statements in that function body.  Until recently the analyzer would just crash when this happened, but Jordan put in a stop gap fix for the BugReporter to discard such bug reports.  That’s better than crashing, but obviously not what we want in the long term.  Instead, we’d prefer to see something like this:

  Dereference of null pull pointer (within call to function foo())

and have that diagnostic reported at the call to foo().  That’s completely doable, and necessary, but it hasn’t been implemented yet.  The upshot is that if you want to use BodyFarm to provide these fake implementations of std::string methods that will only help as much as simulating behavior but you won’t be able to report bugs *within* those functions.

More comments inline.

On Aug 5, 2013, at 11:04 AM, Jared Grubb <jared.grubb at gmail.com> wrote:

> Ted & Anna,
> 
> Thanks for your help. Over the weekend, I dug in and studied more about how clang and the analyzer work, and how the BodyFarm fits into it all. I was able to adjust BodyFarm to let me generate bodies for std::basic_string methods, and I have generated a few bodies just to prove I made it work. I know this doesnt sound like much, but it took me some study and trial-error, as I'm still learning how all this works. :)
> 
> Now I'm ready to start implementing the "real" fake bodies. Here's a rough idea of what I think this could look like:
> 
>     void basic_string::clear() {
> 	hook_invalidate_iterators();
> 	hook_set_content(0, "");  // size, dummy pointer
>     }
> 
>     basic_string& operator=(basic_string&& str)
>     {
> 	hook_invalidate_iterators();
> 	hook_set_content(str.size(), str.data());
>         str.clear();
>         return *this;
>     }
> 
>     size_type basic_string::size() const
>     {
>         return hook_get_size();
>     }
> 
>     const_pointer basic_string::data() const
>     {
>         return hook_get_data();
>     }
> 
> Where each of the "hook_" functions are basic primitives that would get trapped by the respective Checkers (eg, iterator validation checker, out-of-bounds checker). 

This is an interesting design.  I really like using the hooks to extend functionality.  It’s an unexplored space right now, but I think it is worth a shot.

A few caveats:

(1) This is all subject to further design refinement as we explore this design.
(2) There is a potential layering problem here between BodyFarm, which doesn’t know about your checker, and the checker itself.  Ideally the fake function bodies should probably come from code that also provides the hooks, so that it is all self-consistent.  Moreover, we’d like a design where the analyzer truly can be made extensible via this approach without the fake function bodies coming from one place.

That said, #2 is a layering problem that I don’t want to block your initial prototyping work.  We can figure that out as we go.  What I do ask is that you guard the synthesized string bodies under an analyzer configuration variable so that they aren’t synthesized unless the configuration option is passed to the analyzer.  This allows you to have a playground to try out this new approach without disrupting the current analyzer functionality.

> 
> Does that sound like a reasonable approach? It would mean creating fake CXXMethodDecl (or maybe FunctionDecl that explicitly take 'this') that dont really anchor into the actual source, but as far as I can tell, the analyzer core wont care.

It sounds like a great approach, but there are many details to figure out.  As I pointed out, the analyzer does care about the fact that the source locations are bogus, but we can also fix that as well over time.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130812/f5907916/attachment.html>


More information about the cfe-dev mailing list