[cfe-dev] HTML checker diagnostics

Sun Jul 11 22:07:00 PDT 2010

Hi Andrew,

As you point out, there is a scalability issue both with emitting HTML files for large sources as well as viewing them in a web browser.  Your suggestion of only showing the included function has been suggested before, but nobody has looked into actually doing it.

The more general problem is that the scan-build/scan-view interface for viewing bugs can and should be greatly improved.  What is there is basically the same setup that I hooked up 2 years ago when the analyzer was just getting off the ground.  We could do so much more with a web interface for viewing and triaging reports, but I freely admit that I'm not the person with the expertise to lead this effort.

Ideally, scan-view should give you a fast and scalable interactive way to view your reports.  It should scale to a large number of bug reports, and should allow you to easily see all the related bugs within a source file when viewing a source file. (IMHO, this is one particularly strong point of the Xcode integration of the static analyzer).  The disclosure of information should also be incremental so that it both doesn't bring down the web browser but also allows users to focus on the content related to the bug.  The reason why scan-view was created (as opposed to users just opening the index.html file that is present in the analyzer results directory) was to provide a transition point to do this.

Concerning your suggestion of just showing the enclosing function, in my experience that often isn't really enough information for users to understand the bug.  Being able to understand a bug often requires looking at the related functions and methods in order to understand the invariants at play.  To this goal, I think it would be awesome if scan-view would only show the content for the related functions and methods, and then allow the user to interactively disclose more content as they needed it.  Moreover, once we have inter-procedural analysis, bugs can easily cross function calls and/or source files, so the UI needs to be greatly enhanced in order for users to (possibly) follow the call chains, see the definitions of called functions, etc.

Further, it would be great if scan-build/ccc-analyzer got out of the business of emitting HTML reports at all.  Instead, simple digests of the bug can be emitted, and then scan-view should be able to generate the HTML content on-the-fly.  This is essentially what is done with the Xcode integration of the clang static analyzer; there the output format is an XML file which describes the bug, and it is up to Xcode to display it.  These XML files are far smaller than the syntax highlighted HTML reports.

Finally, an added benefit of making scan-view more like a real full-blown web app is that it can possibly support features such as false positive filtering (e.g., filter reports already inspected on previous runs) or the ability to just compare report differences between runs of the analyzer.  This is something that would benefit many people.  There is a significant amount of work here, but it's something people expect of most commercial grade static analysis bug-finding tools.

Going back to your question, if we aren't considering an overhaul of scan-view, I am a little leery of changing the HTML output to just include the function(s) that were touched by the reported paths, as without a way to disclose more information we may be omitting valuable information needed to understand a bug.  That said, I'm willing to consider it as an optional mode for scan-build.  We can then play with it, see how well it works in practice, and go from there.  As for the HTML rewriting, it may be possible to just do the rewrite on sections of the file at a time, emit the markup, and then process the next chunk.  I don't see what this wouldn't be possible, and it may be a good enough solution to mitigate the abysmal performance issues you are seeing.  Another solution is to possibly not use the rewriter at all, and process each line and emit the markup to the destination file directly.  IMHO, all of these solutions, however, are just skirt the real solution of making scan-view more scalable for viewing large source files and triaging many bugs.

Ted

On Jul 7, 2010, at 8:44 PM, Andrew McGregor wrote:

> When running static analysis with HTML output, the report files and memory usage during report generation can be enormous.  For example, I have some files of around 15000 lines that cannot have their scan reports output on a 32 bit machine due to running out of address space, which seems a little ridiculous.
> 
> Wouldn't it be better if we only outputted HTML for the function(s) that were touched by the reported paths, rather than the whole file?  Or, is there some other way to reduce the memory consumption of the rewriter?
> 
> Unfortunately, I'm making no headway on how to do that as I don't really understand the intent of the rewriting code...
> 
> Andrew
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev