[cfe-dev] Adding more analysis algorithms to Clang?

Ted Kremenek kremenek at apple.com
Thu Oct 6 15:02:24 PDT 2011


Hi Guoping,

First, I appreciate your interest in pushing this code back to the open source tree.  I want to start with some high-level issues, as I think there needs to be some significant restructuring before we can really get into the details.

My main concern right now is the factoring of the code.  None of this analysis logic should be added to the CFG class itself, but instead built on top of it.  For example, the LiveVariables analysis builds on top of the CFG; it doesn't embed information inside it.  Instead, we should see a separate API entry point that, given a CFG, returns the analysis results.

Put another way, I should never expect to see analysis logic, like dominators, in CFG.h and CFG.cpp.

There are a variety of reasons for this.  The first is modularity and purity.  The CFG is just in the business of modeling control-flow.  No more, and no less.  That makes it API contract easy (or easier) to understand.

The second is just good software engineering.  If we add 10 new analyses, do they need to get embedded into the CFG?  Besides cluttering the API, that potentially means that some clients of the CFG need to pay the cost of having that logic their even if they don't use.  For example, the compiler itself won't use the dominance analysis in the CFG, but if the analysis gets buried in the CFG API then it potentially must get linked in.  Moreover, I should be able to extend the set of analyses possibly by providing new libraries or plugins; requiring that they are buried in the CFG doesn't allow this flexibility.

Before I am really going to review this patch in detail, you will need to split your logic out of the CFG, and provide new classes and entry points for clients to use.  That's absolutely a prerequisite before it can get checked in.

My other concern is that there is no testing logic.  We absolutely *must* have a way to test these analyses.  We have very high quality standards in Clang, both in terms of code quality but also in terms of functionality.  Without any tests, there is no reason to assume that these analyses really work at all.

What I propose is that you have similar testing to what we do with the LiveVariables analysis.  There is a static analysis check, LiveVariablesDumper (lib/StaticAnalyzer/Checkers/DebugCheckers.cpp) that simply dumps out the live variables information for a CFG.  That information can then be regression tested using FileCheck (there are many examples in the "test" directory of using FileCheck to examine output from a tool).  While I'm not proposing that your analyses be part of the static analyzer proper, we can use the analyzer to test your analyses in this way.  What I would then expect would be a set of tests for each of your analyses that regression tested the expected results of the analysis.  Providing a way to dump these analysis results is not only good for regression testing, but also important just for debugging the behavior of these analyses in general.

The regression testing is also a prerequisite for this code getting checked into the Clang repository.

Cheers,
Ted

On Oct 5, 2011, at 3:44 PM, Guoping Long wrote:

> Thanks Arjun for your feedback. I updated these files (attached).
> Ted, I am looking forward to your comments.
> 
> ------
> Guoping
> 
> 2011/10/5 Arjun Singri <arjunsingri at gmail.com>
> 1. Use "unsigned int" instead of just "unsigned"
> 2. Shouldnt       "CFGStack.pop_back_val(); " on line 45 be outside the "if (AllVisited)" statement?
> 3. The check on line 76 is unnecessary. If you are at line 76 then the check at line 71 was false which means that the check at 76 is true. So I guess replace "else if" with "else".
> 4. typo on line 52
> 
> Arjun
> 
> On Tue, Oct 4, 2011 at 8:34 PM, Guoping Long <longguoping at gmail.com> wrote:
> Dear Ted
> 
>    I added two control flow analysis algorithms: One builds the dominance tree for CFG. The other one finds all SCC components within the CFG. Attached is the patch. I integrated these algorithms within the CFG and CFGBlock classes. This is the first time I submit something, so I have some questions:
> 
> 1 While I would really love to contribute, I am not sure if the code quality is enough. Please let me know any problems with the code so I shall fix them.
> 2 These algorithms shall work on any CFG. I am not sure what kind of special test cases should be added. I do wrote an example (in tools/clang/examples) to illustrate how to use these two algorithms. Is it necessary to submit the example into the examples/ directory?
> 3 I tried my best to follow the code standards and comments. Also, in the comments, I listed the key reference papers which described thorough details of these algorithms. They are the state of the art algorithms as far as I know, and very efficient, although I am not sure my implementation is good or not.
> 
> I am looking forward to feedback. I shall fix problems until it meets the quality for actual commit.
> Thanks.
> 
> ------------
> Guoping
> 
> 
> 2011/10/3 Ted Kremenek <kremenek at apple.com>
> Hi Guoping,
> 
> Additions to libAnalysis have largely been demand-driven.  I think we are fine with general contributions to this library that meeting the following criteria:
> 
> (1) Meet the Clang coding conventions and quality expectations.
> 
> (2) Are testable ("make test"), and tests are included in clang/test.
> 
> (3) Performance is acceptable and measurable, or at least documented when algorithms are known to be expensive.
> 
> (4) APIs are well-documented, which can come in the form of good doxygen comments.
> 
> Source-to-source transformations are welcome, and the Clang codebase does contains tools that are source-to-source rewriters.  For example, libRewrite provides low-level functionality for doing textual rewrites of source files using Clang SourceLocations.  Other analysis algorithms are useful as well; we just prefer that they can be regression tested, and the APIs are well-documented (especially if they don't have any real clients in the codebase at present).
> 
> Cheers,
> Ted
> 
> On Oct 3, 2011, at 2:40 PM, Guoping Long wrote:
> 
> > Hi,
> >
> >    There are some program analysis algorithms in clang/lib/Analysis directory. I am wandering if there are more algorithms to come or not. Are there some people working on this? What kind of algorithms are welcome to be part of Clang?
> >
> >    I am asking this because I want to help on this part. But I am not sure if source-to-source transformation algorithms are welcome in this community, since many optimization work is done in the LLVM backend. Currently I have implemented to CFG analysis algorithms (building the dominance tree and finding the strong connected components in CFG). I plan to implement more data flow analysis or even pointer analysis algorithms. If these algorithms are useful to people in this community, I would love to submit a patch.
> >
> >   I am new to Clang and this community. Hopefully these questions are not naive to you.
> >
> >   Thanks.
> > --------
> > Guoping
> > _______________________________________________
> > cfe-dev mailing list
> > cfe-dev at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> 
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> 
> 
> 
> <CFG.cpp.patch><CFG.h.patch>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20111006/2410020a/attachment.html>


More information about the cfe-dev mailing list