[cfe-dev] [Patch] Update to Checker Development Manual

Anna Zaks ganna at apple.com
Mon Apr 15 11:28:53 PDT 2013

On Apr 14, 2013, at 4:11 PM, Sam Handler <samuel.handler+cfedev at gmail.com> wrote:

> Anna:
> On Thu, Apr 11, 2013 at 12:08 AM, Anna Zaks <ganna at apple.com> wrote
> +<p>These macros will cover a majority of use cases; however, they still have a
> +few limitations. They cannot be used inside namespaces (since they expand to
> +contain top-level namespace references), and the data types that they define
> +cannot be referenced from more than one file.
> I would just say "Note that the macros cannot be used inside namespaces (since they expand to contain top-level namespace references)." The macros can be referred to from multiple files if they are defined in a header that is included from all of them. It's just that each checker is usually contained in a single file.
> The comment above REGISTER_TRAIT_WITH_PROGRAMSTATE says that should not be used "for traits that must be accessible from more than one translation unit." I assume this is the case because the GDMIndex function that is generated is returning the address of a static variable, which would be different in different translation units.

I see. My bad!

> Also, addTransition always has arguments.
> I've changed it to suggest CheckerContext::getState instead of addTransition in cases where there are no state updates and the updated state (as returned by addTransition) in cases where there are state updates.

That does not seem right. A node consists of a ProgramState AND a ProgramPoint.

However I've realized that, here, my previous review comment is not correct. You CAN call addTransition with no arguments:
  ExplodedNode *addTransition(ProgramStateRef State = 0,
                              const ProgramPointTag *Tag = 0) {
    return addTransitionImpl(State ? State : getState(), false, 0, Tag);

In that case, the current ProgramState will be used. However, a new node will be created for the bug report. The node will be different from the previous node - it will be tagged to identify that it has been produced by the current checker. Sorry for the confusion!

> ----
> I think the current flow does not make much sense anymore. For example, "Idea for a Checker" should go before the sections you've added. I think the following order would be best:
> Getting Started
> Static Analyzer Overview
>   Interaction with Checkers
> Idea for a Checker
> Checker Registration
> Checker Skeleton
> Custom Program States
> Representing Values // With added introduction saying that we are going to store info about the values we are observing into the state..
> Bug Reports
> ..
> I was also wondering about the order, but reordering the document seemed a bit too ambitious for a first step. The order I was thinking of was:
>  - Getting Started
>  - Static Analyzer overview   // Without the details of how states are represented
>  - Idea for a checker   // Also merge the current "AST Visitors" section into this one.
Let's leave "AST Visitors" in the separate section. They are very different from the regular checkers. Someone learning about path sensitive checkers does not need to know about how to write an AST Visitor.
>  - Checker Registration
>  - Checker Skeleton
Maybe rename into Checker Callbacks? That's what the new content seems to be.

>  - Program State Example   // Not yet written. An example showing how the ExplodedGraph is constructed for a simple function, including both value constraints and custom program state, without describing the specific data structures used.
>  - Program State Details   // Builds on the previous section; describe the specifics of ExplodedGraph/ExplodedNode, components of ProgramState.
>  - Representing Values
>  - Custom Program States
>  - Bug Reporting
>  - Testing
>  - Hints

This roadmap looks good. I think it's better to start restructuring as we are adding new content. This goes live as soon as you commit it, so we want the flow to ALWAYS make sense.

Here are a couple of comments about the new patch. Not sure if Jordan wants to review this pre-commit as well.

+<p>All of these macros take as parameters the name to be used for the custom
"All of these macros take two parameters: the name to be used for the custom"

+category of state information, and the data type(s) to be used for storage. The
No comma before "and"

+  <li>Create a new checker implementation file, for example <tt>./lib/StaticAnalyzer/Checkers/SimpleStreamChecker.cpp</tt>.
+  <li>Include the following code (replacing SimpleStreamChecker with the name of the new checker).
This is a bit confusing. We probably do not want people to create "SimpleStreamChecker.cpp" as such a file already exists. Maybe we should just change the name to something else throughout the manual. For example, we could use MySimpleStreamChecker. In that case, we would also not need to tell the reader to change the name, as in bullet #2 above. What do you think?
We could also mention that the sample code used in the manual is based on the SimpleStreamChecker from the presentation.

-  There are two main decisions you need to make:
-  <ul>
-    <li> Which events the checker should be tracking. 
-    See <a href="http://clang.llvm.org/doxygen/classento_1_1CheckerDocumentation.html">CheckerDocumentation</a> 
-    for the list of available checker callbacks.</li>
-    <li> What data you want to store as part of the checker-specific program 
-    state. Try to minimize the checker state as much as possible. </li>
-  </ul>
I think it's very important to give people this high level roadmap, before diving into the example. Maybe this could be moved into a separate section in the very beginning. You can then link the first bullet to Checker Skeleton and the second one to Custom Program States. Maybe we could move this into the Idea for a Checker?

+  it to the Analyzer so that it can be displayed. This uses two classes:
"This uses two classes" -> "The following two classes are used to construct a bug report:"

+the bug in the code and the program's state when this location is reached. This
+is in the form of a node in the <tt>ExplodedGraph</tt>.

"the bug in the code " -> "the bug in the source code "
"and the program's state when this location is reached. This is in the form of a node" -> "and the <tt>ExplodedNode</tt> which encompasses the state of the program at the time the bug is encountered"

(Note the reason for this change is that ProgramState is a part of ExplodedNode.)

+<p>The node representing this context can be obtained one of two ways:

"The node" -> "The  <tt>ExplodedNode</tt>, which is a node in the <tt>ExplodedGraph</tt>"

+<li>If the state has not been updated, it can be obtained by calling the <a
This is wrong. See the comment above.

+would prevent the program could continuing. For example, leaking of a resource
The wording is off ("the program could continuing"). 

+<h2 id=additioninformation>Additional Sources of Information</h2>
Note that I've just updated the first paragraph of the page to include some of these links. I think it's OK to keep both, especially if we were to extend the section you've added.

> All of your other comments should be addressed; updated patch attached.
> Thanks once again for reviewing this.
> --Sam
> -- 
> ===============================================================================
> All opinions expressed in posts from this account are entirely my own, and not
> those of any present or former employer. Furthermore, I assert that all works
> contributed to the Clang project (1) were developed using no equipment,
> supplies, facility or trade secrets of any such employer, (2) were developed
> entirely on my own time, and (3) do not result from any work performed for any
> such employer.
> <checker_dev_manual.diff>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20130415/948018a2/attachment.html>

More information about the cfe-commits mailing list