[cfe-dev] Static analysis output format

Mon Jul 7 13:21:14 PDT 2008

On Jul 4, 2008, at 2:41 PM, David Smith wrote:

> 	As we've been working through the list of results from static
> analysis for Adium it's become increasingly clear that the output
> format is introducing some complications. Specifically, each time we
> rerun (whether to use an updated version of checker, or to check
> against the latest source) it eliminates any metadata that we've built
> up around the results, such as which ones were false positives.
> 	Unfortunately, fixing this seems somewhat tricky. The main thing that
> would be necessary is a way of identifying results across runs. That
> way we can plug this into our automated testing system so each time we
> commit it can rerun and say "ok, these ones are known, these ones are
> known false positives, and these ones are new" rather than just
> "here's a list to re-evaluate".

I believe this is a necessary feature, and I think it is one that will  
take several iterations to get right.

> I'm not sure how to come up with some
> sort of identifier for issues though. Line numbers probably change too
> frequently to be reliable. I suppose a heuristic based on function
> name, issue type, file name, and approximate line number might be
> fairly accurate.

This seems like a very reasonable heuristic.  Even eluding the line  
number might be fine for now.

BTW, some of this meta-data can easily be grepped right out of the  
HTML file.  This is exactly what scan-build does to build the  
index.html file.  For example:

$ grep BUG report-wEXcKk.html
<!-- BUGPATHLENGTH 2 -->
<!-- BUGLINE 15 -->
<!-- BUGFILE /Volumes/Data/Users/kremenek/Desktop/MyClass.m -->
<!-- BUGDESC Memory Leak -->

We can easily include other meta-data, such as the function/method  
name where the bug occurs, an cryptographic hash of the source file  
(or function) that contained the bug, etc.

Aside from your own automatic testing tools, ideally, we want the HTML  
output that the tool (scan-build) produces to allow users to triage  
and navigate bugs across runs.  This is an important feature, but not  
immediately high on the priority list.  Much of the heavy lifting  
would probably be done in scan-build (which is currently written in  
Perl) where the summary HTML pages are generated.

Anyone with Perl and HTML knowledge is welcome to provide patches to  
improve this aspect of the system without basically having any  
knowledge of how the analyzer works (meta-data embedded in report- 
XXXXX.html files that is useful for building such features into scan- 
build could be implemented on demand).

Moreover, scan-build can be completely rewritten to provide a more  
advanced system for triaging bugs if anyone is interested in  
undertaking such a project.

Ted