[cfe-commits] [PATCH] More Docs: Mastering the Clang AST

Sean Silva silvas at purdue.edu
Tue Jul 17 15:07:21 PDT 2012


I was wanting to write a similar document and waiting to find the time :)

IMO, the single most helpful thing you can produce is an equivalent of
this diagram <http://docs.python.org/py3k/library/ast.html#abstract-grammar>.
I can't tell you how immeasurably helpful that diagram was when I was
doing work with Python ASTs. It presents in a single compact diagram
*all* essential relationships in the AST. Given, clang's AST is more
complicated, but the same principle applies.

Another thing that really, really helped me understand Python ASTs was
`ast.dump()`. Between the diagram and this, it's really easy to
quickly experiment. clang's `-ast-dump`/`-ast-dump-xml` is at least an
order of magnitude less useful compared with the output of
`ast.dump()`, for the following reasons:

1. lots of superfluous information; I don't care about the pointer
values when I'm trying to learn the structure of the AST. Same for
line/column number information. I understand that this dumping
functionality wasn't really designed with this in mind; it's just a
matter of writing a suitable dumper.
2. doesn't really dump everything in a sane, consistent, readable
format. e.g. `-ast-dump-xml` dumps statements as S-exps. Also, XML is
not very friendly.
3. It just doesn't really work. For example, the results are
disappointing for the first file I tried:

struct foo {
	int x;
	int y;
};
int main(int argc, char **argv)
{
	struct foo f;
}

`clang test.c -Xclang -ast-dump-xml` doesn't print anything, and dies
on a linker error
`clang test.c -fsyntax-only -Xclang -ast-dump-xml` produces nothing.
`clang -cc1 -ast-dump-xml test.c` prints nothing
`clang -Xclang -ast-dump test.c` prints out some stuff, but it doesn't
show me the structure of any of the declarations besides just printing
them back as they were in the source file (quite unhelpful).

I think this needs to be improved, because fundamentally, learning the
AST (and I speak based on my experience "mastering the Python AST") is
an interactive process of writing a piece of code and seeing (and
understanding) the corresponding AST. Here is a rough algorithm that
got me up and running with the Python AST _very_ quickly:

1. see something that you don't quite understand on "the diagram"
2. Write a piece of code that generates that node (if the diagram is
well written and you are familiar with the language and things are
named sanely, this is usually pretty easy; for example, with the
python one, with minor exceptions, there is a straightforward
correspondence with language constructs that are well documented in
the language reference)
3. from "the diagram", look at the the fields that this node has.
4. modify your piece of code in such a way that you think will cause
the ast to change in a particular way
5. re-dump the AST and see if you were right.
6 if you were right, go to 1, else go to 4.

This requires:
1. The diagram.
2. An easy and complete way to visualize the AST that corresponds to
the diagram.
3. Iteration time should be very fast.

Another good thing about this methodology is that it is very flexible,
and can be applied "just in time" to strengthen your understanding of
a particular part as you are developing something that works with a
particular part of the AST.

As an example of how this plays out in Python, here is a simple example:
>>> import ast
>>> ast.dump(ast.parse("5 + 8"))
'Module(body=[Expr(value=BinOp(left=Num(n=5), op=Add(), right=Num(n=8)))])'
>>> m = ast.parse("5 + 8")
>>> m.body[0].value.left.n
5

Notice the straightforward correspondence between the dump and the way
that the AST is manipulated by the program; also note how directly
this maps to "the diagram", giving leads on where to explore next.
Obviously, this is more complicated in C++ due to the program
structure, but nonetheless, this is a *very* effective way to learn
the AST.

About the document itself, I think one of the first things that needs
to be made clear is what you have when you have a "Clang AST"
(ASTContext). I would have a section "what is a clang AST" and make it
clear that you mean an ASTContext.

Another section that I think would be good to have would be to
indicate how you can get ahold of a clang AST. Off the top of my head,
I know that you can hook into ASTConsumer::HandleTranslationUnit, is
this the only way?

--Sean Silva

On Tue, Jul 17, 2012 at 5:39 AM, Manuel Klimek <klimek at google.com> wrote:
> Heya,
>
> first draft of a document many users of our tools have been requesting ;)
>
> This is a first draft where I basically dumped what I would have
> wished for when I started out, but I might be missing stuff or not
> knowing about similar documentation existing somewhere.
>
> If you think this is a good idea to get out, let me know what other
> topics you want to see covered in this document. To me it'll be
> required reading before starting the next document about the AST
> matchers, which require some basic knowledge about the Clang AST...
>
> Feedback welcome!
> /Manuel
>
> _______________________________________________
> cfe-commits mailing list
> cfe-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
>



More information about the cfe-commits mailing list