[cfe-dev] Clang analyzer Google Summer of Code ideas/proposals

Samuel Harrington samuel.harrington at mines.sdsmt.edu
Thu Mar 25 09:18:04 PDT 2010


Hello,

I am interested in doing a project with Clang in the upcoming Google
Summer of Code. I am currently a sophomore at the South Dakota School
of Mines and Technology, and I have some C++, Perl, and Javascript
programming experience. I have been interested in Clang and LLVM for a
while, and I've looked through some of the code before. I am most
interested in the analyzer component though.


I have two possible project ideas I am interested in:


A) Bug database

Create a tool to store bugs and track changes over time.

This tool would use the XML analyzer output and the CIndex library to
correlate bugs over multiple runs. The tool would provide, at a
minimum, a diff-like output given a pair of runs. Ideally, this would
create and update a database with all the runs, and statuses for all
the bugs (uninspected, false positive, verified, fixed). The tool
would provide reports with chosen subsets of the bugs and annotations
such as first run present and current status. The reports could be
html output, reusing the existing infrastructure, or be viewable in a
gui application.

The database could be XML, SQLite, or some plain-text format. I am
unsure whether this tool should be integrated into the clang binary,
be a separate executable, or even use a scripting language like
Python. However it is implemented, it would be integrated into
scan-build/scan-view.

I am interested in this project because it would make using the
analyzer easier for larger projects. The diff output could be used as
a regression finder or fix checker. The database would allow users to
keep track of bugs better, and to provide statistics of bugs over
time.


B) User-made checkers

This would provide some sort of easy extension mechanism to the
analyzer to allow simple domain-specific checks. I have a couple of
ideas of how this would look.


1) The first would be to read and use mygcc [1] rules to detect bugs.
I believe this would would only provide simple flow-sensitive
analysis, but it looks useful nonetheless. This would require making a
pattern matcher to match ast nodes based on a parsed text expression.


2) Second, would be an interface to the analysis engines from a
scripting language, perhaps python. This would be more complicated to
use than mygcc, but likely more useful. For example, a check to make
sure open has a third parameter if the CREATE flag is present is very
simple given a scripting language, but impossible using mygcc rules
[2].

If I was to do this project, I would likely try to do the second idea
first, and if time permits, write a mygcc matcher in the scripting
language. Implementing mygcc rules in the scripting language would
provide a good test of the interface completeness.

I am interested in this because the clang analyzer could be easily
extended with domain specific checks. For example, specialized locking
rules could be checked using mygcc rules. A trickier example [3] would
be to make sure a llvm::StringRef is not assigned a std::string that
goes out of scope before it. This would be possible using a scripting
language binding, and easier than modifying the Clang source. These
types of checks are already being implemented in Clang, but it is
infeasible for specialized checks for arbitrary given projects to be
embedded. This project would be a way around the problem.


3) The closest tool I have seen to #2 is Dehydra [4], which also has a
goal of allowing user-defined bug finding scripts. A complicating
factor is that the scripting language is Javascript, and it may be
infeasible to provide a compatible interface. Nevertheless, I am
including replicating the interface here as a third possibility.


Sorry for the incredibly long email. :)

Are either of these proposals interesting? Any criticisms, ideas? All
comments and questions would be appreciated.

Thanks,
- Sam


[1] http://mygcc.free.fr/
Note: I forget how I found this, I believe it was through an email on
this list, but I can't find it.

[2] example taken from Clang source
lib/Checker/UnixAPIChecker.cpp

[3] example again from an existing Clang check
lib/Checker/LLVMConventionsChecker.cpp  line 133

[4] https://developer.mozilla.org/en/Dehydra



More information about the cfe-dev mailing list