[cfe-dev] StaticAnalyzer: Implementing checks for std::string

Ted Kremenek kremenek at apple.com
Mon Jul 29 10:26:37 PDT 2013


On Jul 29, 2013, at 10:13 AM, Anna Zaks <ganna at apple.com> wrote:

> 
> On Jul 28, 2013, at 11:58 AM, Jared Grubb <jared.grubb at gmail.com> wrote:
> 
>> 
>> On Jul 23, 2013, at 10:04, Anna Zaks <ganna at apple.com> wrote:
>> 
>>> 
>>> On Jul 20, 2013, at 4:32 PM, Jared Grubb <jared.grubb at gmail.com> wrote:
>>> 
>>>> I was looking at trying to implement a checker and thought a fun one would be to try to implement checkers around std::string. For example, I figured I could start with trying to detect out-of-bound access when the length of the string can be determined.
>>> 
>>> The existing (experimental) CString checker already performs a similar check. You can take a look at how it associates the length of a string with the region that represents the string. The main two reasons it's experimental is that it has not been sufficiently tested on real code and we felt that the diagnostics for out-of-bound are usually hard to understand. For example, often out-of-bound is due to an overflow or underflow; and it would be beneficial to have some explanation of these events to the user.
>>> 
>>> The CSting checker does not use BodyFarm (mainly, because it was written before it). The idea behind using the BodyFarm, would be to model the functions using the farm but you would still write the checks inside a checker.
>>> 
>>>> 
>>>> One clang doc(*) suggested that there should be BodyFarm implementations of std::string functions.
>>> 
>>> This would allow the analyzer to have precise reasoning about these known functions, which would, for example, lead to less false positives.
>> 
>> Thanks, Anna.
>> 
>> That makes sense; also, I could see that you might be able to get much deeper analysis than simple emulation.
>> 
>>>> Would that be the best way to try to solve the OOB check problem? Or is it better to try to "emulate" interactions with std::string objects and try to track the size of the string? 
>>>> 
>>>> It seems that providing an actual implementation of the functions will be more accurate than an emulation, but then I'm not sure why that's not already visible from <string>? Is there some limitation in the static analyzer that keeps it from already having the source from <string>? Or will an emulation provide something richer than the raw <string> source could provide?
>>> 
>>> The analyzer only sees what is implemented in the header. Also, we do have limits on cross-function-analyzes. Specifically, we do not "inline" function calls that are too deep or too large. Another reason to model functions with body farm is that the analyzer does not always "understand" the invariants of the code as it is written in the library.
>> 
>> Does BodyFarm get any freedom over the data layout of std::string? STL doesnt specify how the objects are implemented. So, would a particular implementation of BodyFarm be restricted to work only against libc++? Or do you ignore the actual data model and just have it conjure the values itself via analyzer?
> 
> The BodyFarm emulation should be generic and as simple as possible. It would probably be an abstraction of what the APIs actually do. We do not have to follow the existing data layout.

Right.  BodyFarm should just be viewed as a factory for faux implementations that model the implementations of functions/methods in service of the analyzer.  Like graphics in games (where reality is “faked” to look good), we can get away with something that has completely no bearing to the original implementation as long as it is also self-consistent within the analyzer and makes the analyzer smarter.  For example, suppose parts of the implementation of std::string are visible to the analyzer, say in methods in the headers.  If BodyFarm conjures up methods that don’t interact well with those implementations (which may be visible to the analyzer) than that could be a problem.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20130729/0588be47/attachment.html>


More information about the cfe-dev mailing list