[cfe-dev] StaticAnalyzer: Implementing checks for std::string

Mon Jul 29 10:13:50 PDT 2013

On Jul 28, 2013, at 11:58 AM, Jared Grubb <jared.grubb at gmail.com> wrote:

> 
> On Jul 23, 2013, at 10:04, Anna Zaks <ganna at apple.com> wrote:
> 
>> 
>> On Jul 20, 2013, at 4:32 PM, Jared Grubb <jared.grubb at gmail.com> wrote:
>> 
>>> I was looking at trying to implement a checker and thought a fun one would be to try to implement checkers around std::string. For example, I figured I could start with trying to detect out-of-bound access when the length of the string can be determined.
>> 
>> The existing (experimental) CString checker already performs a similar check. You can take a look at how it associates the length of a string with the region that represents the string. The main two reasons it's experimental is that it has not been sufficiently tested on real code and we felt that the diagnostics for out-of-bound are usually hard to understand. For example, often out-of-bound is due to an overflow or underflow; and it would be beneficial to have some explanation of these events to the user.
>> 
>> The CSting checker does not use BodyFarm (mainly, because it was written before it). The idea behind using the BodyFarm, would be to model the functions using the farm but you would still write the checks inside a checker.
>> 
>>> 
>>> One clang doc(*) suggested that there should be BodyFarm implementations of std::string functions.
>> 
>> This would allow the analyzer to have precise reasoning about these known functions, which would, for example, lead to less false positives.
> 
> Thanks, Anna.
> 
> That makes sense; also, I could see that you might be able to get much deeper analysis than simple emulation.
> 
>>> Would that be the best way to try to solve the OOB check problem? Or is it better to try to "emulate" interactions with std::string objects and try to track the size of the string? 
>>> 
>>> It seems that providing an actual implementation of the functions will be more accurate than an emulation, but then I'm not sure why that's not already visible from <string>? Is there some limitation in the static analyzer that keeps it from already having the source from <string>? Or will an emulation provide something richer than the raw <string> source could provide?
>> 
>> The analyzer only sees what is implemented in the header. Also, we do have limits on cross-function-analyzes. Specifically, we do not "inline" function calls that are too deep or too large. Another reason to model functions with body farm is that the analyzer does not always "understand" the invariants of the code as it is written in the library.
> 
> Does BodyFarm get any freedom over the data layout of std::string? STL doesnt specify how the objects are implemented. So, would a particular implementation of BodyFarm be restricted to work only against libc++? Or do you ignore the actual data model and just have it conjure the values itself via analyzer?

The BodyFarm emulation should be generic and as simple as possible. It would probably be an abstraction of what the APIs actually do. We do not have to follow the existing data layout.

> 
> 
>>> I'll probably have follow up questions, but my questions branch out from those basic approaches, so I figured I'd start there.
>>> 
>>> Jared
>>> 
>>> (*) http://clang-analyzer.llvm.org/open_projects.html
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130729/ff695cd2/attachment.html>