[cfe-dev] Clarification for term "AST"

Sat Feb 23 22:28:31 PST 2013

On 2/23/2013 11:56 AM, Larry Evans wrote:
> Now, where are those problems described and where is the rationale for 
> choosing the clang data structures to solve those problems described? 
> With that information, maybe Markus would have a better idea about why 
> his suggestions wouldn't work or maybe steer him to an alternative 
> solution.

Let me preface by saying that most of my use of Clang is to do things 
that are strictly orthogonal from actually compiling C++ code.

I can't say that I can think of a single time where I would ever want 
the raw parsed AST with no semantic analysis done on it. The kinds of 
queries you want from a C++ programs tend to start with things like 
"find all variables" or "find all functions", yet faithfully following 
the grammar (and not combining levels of the tree) would result in a lot 
of boilerplate code to do unwrapping, especially when you consider the 
contortions needed to represent something like "int (*foo(int))(double)" 
as a proper tree. From my standpoint as a user, anytime I have to 
duplicate Clang's reasoning of logic, it is a bug that Clang does not 
give me an API to do that.

Theoretical purity isn't a particularly useful design goal, since true 
purity is often (IMHO) rather pointless. C++11 defines 9 phases of 
translations, of which the first 6 can be described as lexing, the 7th 
"parse, semantic analysis, code generation, and optimization," the 8th 
linking, and the final one the runtime loader. Yet there is no practical 
reason to, for example, output the source program exactly as it is right 
after folding lines that end in a backslash. Indeed, preserving the 
ability to do this action would unnecessarily complicate later pipelines 
(you would need to attach cumbersome metadata to source files prior to 
actual tokenization, the first step of which happens in the subsequent 
phase).

Another example I can give is that the grammar represents precedence by 
explicitly doing the factoring itself, which means that you would need, 
in order to achieve theoretical purity, to wrap every multiplicative 
expression in a node which is an unary additive expression. Ignoring 
this does fail to achieve theoretical purity, since you are representing 
the AST in a manner not identical to the grammar; however, by 
sacrificing purity, you are making everybody's lives very much easier.

-- 
Joshua Cranmer
News submodule owner
DXR coauthor