[cfe-dev] [analyzer] Adding build bot for static analyzer reference results

Mon Sep 28 16:25:42 PDT 2015

Hi all,

We’re planning to add a public Apple build bot for the static analyzer to Green Dragon (http://lab.llvm.org:8080/green/). I’d like to get your feedback on our proposed approach. 

The goal of this bot is to catch unexpected analyzer regressions, crashes, and coverage loss by periodically running the analyzer on a suite of open-source benchmarks. The bot will compare the produced path diagnostics to reference results. If these do not match, we will e-mail the committers and a small set of interested people. (Let us know if you want to be notified on every failure.) We’d like to make it easy for the community to respond to analyzer regressions and update the reference results.

We currently have an Apple-internal static analyzer build bot and have found it helpful for catching mistakes that make it past the normal tests. The main downside is that the results need to be updated when new checks are added or the analyzer output changes.

We propose taking a “curl + cache” approach to benchmarks. That is, we won’t store the benchmarks themselves in a repository. Instead, the bots will download them from the projects' websites and cache locally. If we need to change the benchmarks (to get them to compile with newer versions of clang, for example) we will represent these changes as patch sets which will be applied to the downloaded version. Both these patch sets and the reference results will be checked into the llvm.org/zorg repository so anyone with commit access will be able to update them. The bot will use the CmpRuns.py script (in clang’s utils/analyzer/) to compare the produced path diagnostic plists to the reference results.

We’d very much appreciate feedback on this proposed approach. We’d also like to solicit suggestions for benchmarks, which we hope to grow over time. We think sqlite, postgresql, openssl, and Adium (for Objective-C coverage) are good initial benchmarks — but we’d like to add C++ benchmarks as well (perhaps LLVM?).

Devin Coughlin
Apple Program Analysis Team