[cfe-dev] StaticAnalyzer: Implementing checks for std::string

Tue Jul 23 10:04:08 PDT 2013

On Jul 20, 2013, at 4:32 PM, Jared Grubb <jared.grubb at gmail.com> wrote:

> I was looking at trying to implement a checker and thought a fun one would be to try to implement checkers around std::string. For example, I figured I could start with trying to detect out-of-bound access when the length of the string can be determined.

The existing (experimental) CString checker already performs a similar check. You can take a look at how it associates the length of a string with the region that represents the string. The main two reasons it's experimental is that it has not been sufficiently tested on real code and we felt that the diagnostics for out-of-bound are usually hard to understand. For example, often out-of-bound is due to an overflow or underflow; and it would be beneficial to have some explanation of these events to the user.

The CSting checker does not use BodyFarm (mainly, because it was written before it). The idea behind using the BodyFarm, would be to model the functions using the farm but you would still write the checks inside a checker.

> 
> One clang doc(*) suggested that there should be BodyFarm implementations of std::string functions.

This would allow the analyzer to have precise reasoning about these known functions, which would, for example, lead to less false positives.

> Would that be the best way to try to solve the OOB check problem? Or is it better to try to "emulate" interactions with std::string objects and try to track the size of the string? 
> 
> It seems that providing an actual implementation of the functions will be more accurate than an emulation, but then I'm not sure why that's not already visible from <string>? Is there some limitation in the static analyzer that keeps it from already having the source from <string>? Or will an emulation provide something richer than the raw <string> source could provide?

The analyzer only sees what is implemented in the header. Also, we do have limits on cross-function-analyzes. Specifically, we do not "inline" function calls that are too deep or too large. Another reason to model functions with body farm is that the analyzer does not always "understand" the invariants of the code as it is written in the library.

> 
> I'll probably have follow up questions, but my questions branch out from those basic approaches, so I figured I'd start there.
> 
> Jared
> 
> (*) http://clang-analyzer.llvm.org/open_projects.html
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130723/c8e1fdfb/attachment.html>