<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><br></div><div><div>On Apr 1, 2012, at 5:56 PM, Mark McCurry wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><div><blockquote type="cite"><blockquote type="cite">- Come up with a framework of storing the intermediate results between the analyzes of different translation units.<br></blockquote></blockquote><blockquote type="cite"><blockquote type="cite"> Most likely, this is the direction the current clang static analyzer will take at some point; however, this is a challenging problem.<br></blockquote></blockquote><blockquote type="cite">This, by itself, would make a good GSoC project.  A persistence framework for adding annotations to functions, so that the analyser could run as in two passes, one collecting metadata and another applying it would not be a massively complex project by itself, but would be hugely useful for a whole range of analyses.<br></blockquote><br>After letting the comments digest for a while, I am fairly sure that<br>my original proposals scope should be expanded some.<br>It sounds like it would be feasible to take some steps in the<br>direction of what David mentioned, though I am not exactly sure which<br>ones.<br>I am very hesitant to generalize persistence past function (and static<br>data?) annotation, as I do want to have a coherent result at the end<br>of GSoC.<br><br>After reading through some portions of the documentation on<br>/clang/lib/StaticAnalyzer, I am not too sure where a persistence<br>framework would fit relative to the clang code base.<br>From what I understand persistence work would need to be at least<br>within /clang/lib, but I am not entirely sure on that based upon my<br>current knowledge of how clang is structured.<br>Is it safe to assume that the callgraph/annotation analysis should<br>only be interfaced to clang through the libclang interface?<br></div></blockquote><div><br></div><div>For now, you could restrict it to the analyzer; it can be moved out if there are more users.</div><br><blockquote type="cite"><div><br><blockquote type="cite">One could imagine a simple database (sqlite ?) to record all this information, and perhaps a simple python script on the side to query it once the analysis is complete.<br></blockquote><br>Is there some standard way for recording this information or is sqlite<br>just what comes to mind?<br><br></div></blockquote><div><br></div>I think a database might be an overkill in this case (we probably don't want to introduce the dependency on sqlite). You can just write out a formatted file. </div><div><br></div><div>There are several places where clang is doing serialization already. You might check if one of those mechanisms is something that suits your purposes. For example, libclang has the ability to create a .pch file - a serialized AST, and thus, contains subroutines to serialize a DenseMap, which might be enough. Clang and the analyzer serialize diagnostic information into different formats.<br><br><blockquote type="cite"><div>As for the rest of the semantics proposed by Matthieu, I find them to<br>be excellent specs to work by.<br><br><blockquote type="cite">There is an attribute called "annotate" that enable you to embed string literals like:<br></blockquote><br>Well that eliminates the need for another attribute, assuming that<br>hijacking this does not create any issues.<br>Thanks Michael.<br><br><blockquote type="cite">You don't necessarily need the full AST, just having a call graph...<br></blockquote><br>I had assumed that the call graph information was not readily accessible.<br><br><blockquote type="cite">(clang has rudimentary call graph support already)<br></blockquote><br>Could you point me to where that support may be?<br>After reading through some of the internals, I started to find<br>warnings about the "gore of the internal analysis engine".<br></div></blockquote><br>See <a href="http://clang.llvm.org/doxygen/CallGraph_8h_source.html">http://clang.llvm.org/doxygen/CallGraph_8h_source.html</a></div><div><br></div><div><blockquote type="cite"><div><blockquote type="cite">Here is a rough algorithm for solving this, taking the "reentrant" annotation as an example.<br></blockquote><blockquote type="cite">1) Build a call graph (clang has rudimentary call graph support already).<br></blockquote><blockquote type="cite">2) For each node(representing a function) internally mark it with "reentrant", "non reentrant", "don't know".<br></blockquote><blockquote type="cite">3) Iterate through the nodes in the graph and propagate the annotations:<br></blockquote><blockquote type="cite">      If at least one of of the callees is "non-reentrant", the caller becomes "non-reentrant".<br></blockquote><blockquote type="cite">     If all of the callees are "reentrant", the caller becomes "reentrant".<br></blockquote><blockquote type="cite">      Validate: If a function becomes "non-reentrant" and it has "reentrant" user annotation, raise a warning.<br></blockquote><blockquote type="cite">4) Repeat Step #4 until no change. (To optimize performance, you'd iterate in topological order starting from the callees.)<br></blockquote><br>That is fairly close to what I have intended to do, but seeing it<br>formalized so concisely makes me think that the scope of the project<br>should be expanded some.<br><br>So with all of that said, is it reasonable to extend this project into<br>some restricted two pass persistence framework for clang's static<br>analysis that could have the previously described property checking as<br>the first use of the new functionality?<br></div></blockquote><div><br></div>I think so.</div><div><br><blockquote type="cite"><div>Hopefully this work can be built upon in future clang development.<br><br>--Mark<br></div></blockquote></div><br></body></html>