[cfe-dev] Using Clang AST classes to generate LLVM IR for a compiler front-end

Mon Dec 24 11:31:23 PST 2012

> - High-level optimizations (out of the grasp of LLVM)

FYI clang does not really do many (any?) "high level optimizations".
Its AST representation is meant for representing the source code
exactly and is "immutable", so no transformations are done on it.

On Mon, Dec 24, 2012 at 11:28 AM, LP <lionel.parreaux at gmail.com> wrote:
> However, I'm wondering if the Clang API I'm looking for is stable enough, or
> if it may change too quickly or too radically...

The basic API of codegen'ing has a small and tight interface (note the
sparsity of include/clang/CodeGen/). It should be stable enough,
although as a heads-up it is planned to be renamed to IRGen soon
(along with many (private) classes that fall under its purview). The
difficult part is feeding it an AST that it will understand.

> So my question is: Would it be worth the trouble to learn and use the Clang
> AST library so that my font-end can use it, and would it be a good option
> for the future?

In short: it probably won't be worth your effort. Clang's AST is
extremely complicated (reflective of the complexity of C++).

Longer explanation: I have not heard of anybody directly creating
Clang AST's for a foreign language, and I don't think that the AST API
is meant for doing that. If you want to directly build AST's, it
should be relatively straightforward (at a high level) to just
instantiate the AST classes (in include/clang/AST) and link them
together in "appropriate ways". However, although I do not have superb
knowledge of the AST, my belief is that you will run into a lot of
"devil is in the details" problems due to unspoken invariants in the
AST which makes "link them together in "appropriate ways"" very hard
to achieve.

The talk by Ronan Keryell at the latest dev meeting
<http://llvm.org/devmtg/2012-11/> may give you a better feel for the
current state of creating Clang's ASTs by any means other than Clang's
own Parse/Sema infrastructure.

> Or (this is ugly), should I rather generate plain C++ code and compile it
> with any C++ compiler (at least until I find something better to do)?

>From briefly skimming your paper, this seems like by far the easiest
approach, at least for an initial prototype.

Targeting LLVM IR, which is designed (and documented) for exactly the
purpose that you want (generating code), is probably the right way to
go if you want to make this language production-quality. However, if
you then want interoperability with C/C++, you will have to do
struct/class layout like a C/C++ compiler, have calling conventions
like a C/C++ compiler, exceptions, vtables, etc. This is generally
really complicated, so it may make sense to reuse parts of clang to do
that; however, there are currently no interfaces in clang designed
specifically for doing any of those things (e.g. look at what it takes
just to print out a class's layout in the function
DumpCXXRecordLayout() in clang/lib/AST/ASTRecordLayoutBuilder.cpp).
Exposing nice APIs for this stuff seems generally useful in a variety
of circumstances, so patches to improve this situation for your use
case would very likely be accepted. You may also want to ping David
Abrahams since a while back he was looking at isolating C++ ABI stuff
into a separate library (although I don't think anything came of it).

-- Sean Silva