Hi Samuel,<br><br>I haven't thought through about what the bug database or extension script would be like. So I couldn't comment on your proposals. <br><br>But I personally prefer some improvements over the current core analysis engine. For example, improve the inter-procedural analysis, add an integer overflow detector, add a more powerful constraint manager, or add C++ support, etc.<br>

<br><div class="gmail_quote">2010/3/26 Samuel Harrington <span dir="ltr"><<a href="mailto:samuel.harrington@mines.sdsmt.edu">samuel.harrington@mines.sdsmt.edu</a>></span><br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

Hello,<br>

<br>

I am interested in doing a project with Clang in the upcoming Google<br>

Summer of Code. I am currently a sophomore at the South Dakota School<br>

of Mines and Technology, and I have some C++, Perl, and Javascript<br>

programming experience. I have been interested in Clang and LLVM for a<br>

while, and I've looked through some of the code before. I am most<br>

interested in the analyzer component though.<br>

<br>

<br>

I have two possible project ideas I am interested in:<br>

<br>

<br>

A) Bug database<br>

<br>

Create a tool to store bugs and track changes over time.<br>

<br>

This tool would use the XML analyzer output and the CIndex library to<br>

correlate bugs over multiple runs. The tool would provide, at a<br>

minimum, a diff-like output given a pair of runs. Ideally, this would<br>

create and update a database with all the runs, and statuses for all<br>

the bugs (uninspected, false positive, verified, fixed). The tool<br>

would provide reports with chosen subsets of the bugs and annotations<br>

such as first run present and current status. The reports could be<br>

html output, reusing the existing infrastructure, or be viewable in a<br>

gui application.<br>

<br>

The database could be XML, SQLite, or some plain-text format. I am<br>

unsure whether this tool should be integrated into the clang binary,<br>

be a separate executable, or even use a scripting language like<br>

Python. However it is implemented, it would be integrated into<br>

scan-build/scan-view.<br>

<br>

I am interested in this project because it would make using the<br>

analyzer easier for larger projects. The diff output could be used as<br>

a regression finder or fix checker. The database would allow users to<br>

keep track of bugs better, and to provide statistics of bugs over<br>

time.<br>

<br>

<br>

B) User-made checkers<br>

<br>

This would provide some sort of easy extension mechanism to the<br>

analyzer to allow simple domain-specific checks. I have a couple of<br>

ideas of how this would look.<br>

<br>

<br>

1) The first would be to read and use mygcc [1] rules to detect bugs.<br>

I believe this would would only provide simple flow-sensitive<br>

analysis, but it looks useful nonetheless. This would require making a<br>

pattern matcher to match ast nodes based on a parsed text expression.<br>

<br>

<br>

2) Second, would be an interface to the analysis engines from a<br>

scripting language, perhaps python. This would be more complicated to<br>

use than mygcc, but likely more useful. For example, a check to make<br>

sure open has a third parameter if the CREATE flag is present is very<br>

simple given a scripting language, but impossible using mygcc rules<br>

[2].<br>

<br>

If I was to do this project, I would likely try to do the second idea<br>

first, and if time permits, write a mygcc matcher in the scripting<br>

language. Implementing mygcc rules in the scripting language would<br>

provide a good test of the interface completeness.<br>

<br>

I am interested in this because the clang analyzer could be easily<br>

extended with domain specific checks. For example, specialized locking<br>

rules could be checked using mygcc rules. A trickier example [3] would<br>

be to make sure a llvm::StringRef is not assigned a std::string that<br>

goes out of scope before it. This would be possible using a scripting<br>

language binding, and easier than modifying the Clang source. These<br>

types of checks are already being implemented in Clang, but it is<br>

infeasible for specialized checks for arbitrary given projects to be<br>

embedded. This project would be a way around the problem.<br>

<br>

<br>

3) The closest tool I have seen to #2 is Dehydra [4], which also has a<br>

goal of allowing user-defined bug finding scripts. A complicating<br>

factor is that the scripting language is Javascript, and it may be<br>

infeasible to provide a compatible interface. Nevertheless, I am<br>

including replicating the interface here as a third possibility.<br>

<br>

<br>

Sorry for the incredibly long email. :)<br>

<br>

Are either of these proposals interesting? Any criticisms, ideas? All<br>

comments and questions would be appreciated.<br>

<br>

Thanks,<br>

- Sam<br>

<br>

<br>

[1] <a href="http://mygcc.free.fr/" target="_blank">http://mygcc.free.fr/</a><br>

Note: I forget how I found this, I believe it was through an email on<br>

this list, but I can't find it.<br>

<br>

[2] example taken from Clang source<br>

lib/Checker/UnixAPIChecker.cpp<br>

<br>

[3] example again from an existing Clang check<br>

lib/Checker/LLVMConventionsChecker.cpp  line 133<br>

<br>

[4] <a href="https://developer.mozilla.org/en/Dehydra" target="_blank">https://developer.mozilla.org/en/Dehydra</a><br>

_______________________________________________<br>

cfe-dev mailing list<br>

<a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>

<a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>

</blockquote></div><br>