[cfe-dev] Clarification for term "AST"

Jordan Rose jordan_rose at apple.com
Fri Feb 22 10:09:51 PST 2013

Hi, Markus. One thing to take from this lack of response is that we're all very busy and just don't always respond to e-mails in a timely way. But another is probably what you first thought: making this change is not interesting to the majority of active Clang developers.

We don't have "token stream" as a first-class concept. C languages can get their tokens from many places—a file, a macro, a pretokenized header (essentially deprecated but still working, I think). Rather than deal with "token streams", we have PreprocessorLexer and its subclasses, which produce tokens from various kinds of input. The lexers do have different input sources, but these are character buffers (or rather, byte buffers) rather than streams of any kind.

Yes, our AST classes do not form a tree, nor are they strictly for syntax. At the same time, they do not exactly form a general graph either, and they are not strictly for semantics. Some examples:
- We preserve parentheses in our "AST" in the form of ParenExprs.
- We include implicit transformations like lvalue-to-rvalue conversions in the form of ImplicitCastExprs.
- A DeclRefExpr may refer to a Decl that encloses the current DeclContext, such as using a C++ class name in an implemention of one of its methods.
- Some nodes can be accessed through multiple "child" paths, such as the condition and the consequent in the GNU binary ?: extension expression. (These are wrapped in OpaqueValueExprs to prevent from double-traversal from naive tools.)
- Nodes do not have a common type—the Big Three are Decl, Type (and its wrapper QualType), and Stmt, but there are several other classes which may or may not be considered "AST nodes" (CXXBaseSpecifier, CXXCtorInitializer, ObjCDictionaryElement...) but are certainly important parts of the "AST".

What we have doesn't match up to a computer-science-theoretical notion of an AST or an ASG, but we don't have a better name. We're not really looking for one, either, since it doesn't seem to be causing trouble for anyone and no one has complained about it until now.

We have a CFG and it is called CFG. :-) (Actually, we have two—one for Clang, and one for LLVM's SSA-based IR. The former is only used for smart warnings and the static analyzer, though.)

Sorry, but I think this one is "not to be fixed".

On Feb 22, 2013, at 9:15 , Markus Elfring <Markus.Elfring at web.de> wrote:

>> http://llvm.org/bugs/show_bug.cgi?id=15254#c0
> How do you think about to distinguish terms (and their relationships) like the
> following a bit more in your application programming interfaces?
> - TS: token stream
> - AST: abstract syntax tree
> - ASG: abstract semantic graph
> - CFG: control flow graph
> Regards,
> Markus
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

More information about the cfe-dev mailing list