[cfe-dev] How to pretty print the abstract syntax tree for a C source file?

Ronan Keryell Ronan.Keryell at hpc-project.com
Thu Aug 25 16:30:26 PDT 2011


>>>>> On Tue, 23 Aug 2011 11:48:35 -0700, Simon <simonhf at gmail.com> said:

    >> In our Par4All compiler, we use another tool internally, PIPS,
    >> that allows to output the AST in HTML or in a textual format.

    >> There is a web service you can use to have an idea of what it can
    >> output: http://pips4u.org/doc/ir-navigator


    Simon> I tried it out. Very nice web interface. Close to what I want
    Simon> except for two things: 1. Is it possible to output file line
    Simon> number and character position so that it is also part of the
    Simon> tree?  2. Is it possible for the tree to contain info
    Simon> relating to whitespace and comments?

Thanks for asking! I've just realized that indeed the full internal
representation of a statement is not displayed on this WWW interface. :-(

For example the line number (called "number" indeed in PIPS AST) and the
comments are not displayed.  We should not trust our PhD students... :-)

We keep spacing information by storing them in the comments indeed.
A comment is anything around (spaces, // or /*, \n...), not only the
comments by themselves, to capture most of the syntactic context.

But anyway, the character position is lost. But of course it could be
stored...

(The curious people can look at
http://www.cri.ensmp.fr/pips/newgen/ri.htdoc/#x1-270003.4
on what we keep in statements)

    Simon> Ideally I'd like to be able to use a tool like PIPS to do
    Simon> so-called round trip parsing where the original source code
    Simon> can be rebuilt exactly from the intermediate
    Simon> representation.

It is difficult to achieve this with tools that have been conceived to
do source-to-source transformation and have a compact canonical internal
representation. Often some information is lost in translation.

For example, in PIPS, with an common internal representation for C or
Fortran, there is some loss of information. You can parse in Fortran
and prettyprint in C for example. :-) Sounds crazy but useful to
generate CUDA or OpenCL...

Often there are many different ways to express the same thing (for
example in Fortran declarations) so, in analysis tools, it is useless to
keep this and you can have a canonical internal representation. But
users can be disappointed with a simple prettyprint not producing the
same text as the input.

There was also a recent discussion on the ROSE compiler (another tool
you could look at) mailing list on this issue.

If you want to keep preprocessing into account + program
transformations: undecidable to get back a sensible source in the
general case...

Long time ago, I saw a tool to do Y2K refactoring with 2 internal
representations : 1 canonical for the abstract interpretation and a
concrete one for the syntactic details to be served back to the user.

But anyway, if it is just for parsing, the Clang parser should keep you
the information you want by adding some hooks...
-- 
  Ronan KERYELL                      |\/  Cell:   +33 613 143 766
  HPC Project                        |/)  Ronan.Keryell at hpc-project.com
  5201 Great America Parkway #3241   K    skype:keryell
  Santa Clara, CA 95054              |\   http://hpc-project.com
  USA                                | \



More information about the cfe-dev mailing list