[cfe-dev] Future of AST transformation

Thu Jul 19 01:36:35 PDT 2012

>>>>> On Wed, 18 Jul 2012 15:22:46 -0700, Richard Smith <richard at metafoo.co.uk> said:

    Richard> Since you said you're performing source to source
    Richard> transformations, AST transformation is unlikely to be a
    Richard> good path to follow. A source to source transformation tool
    Richard> should usually make very targeted modifications to the
    Richard> code, and it's not really practical to deduce what those
    Richard> changes should have been if all you have are ASTs from
    Richard> before and after (think about preserving whitespace,
    Richard> comments, macros, templates, ...).

It depends on what you want to do with source-to-source transformation.

If for example I use source-to-source transformations to generate from a
sequential C/C++ program a parallel program with multiple processes
communicating with MPI, with OpenMP #pragma on each SMP node with some
calls to SIMD intrinsics and some parts in CUDA or OpenCL to address
heterogeneous computing, a simple textual transformation is just NOT a
good path to follow.

Another typical use case is high-level hardware synthesis from C/C++.

    Richard> The usual approach for clang-based source-to-source
    Richard> transformation tools is to use the AST to determine what
    Richard> changes should be made, then produce a list of
    Richard> modifications to be made to the original source file. See
    Richard> clang::Rewriter, clang::tooling::Replacement, and
    Richard> clang::tooling::RefactoringTool for some components which
    Richard> make it easier to build such tools.

Yes, but some people are interested by a less "usual approach". :-)
For serious things, AST transformations are the only way to go.

Keeping track of comments in the AST is a first step, even if
source-to-source transformations are intractable in the general case
because you need to keep the semantics of... comments. I do not even
talk about macros...

Just imagine how to translate
/* ... */
/* ... */ a/*  */
    +=
  /* ... */b  /* ... */
 ; //...

to
a = a + b;

Where do you want to keep/duplicate the comments ? You have to
"understand" them to resynthesize their new layout and content. :-(

Even with simpler cases it is already a nightmare...

What we do right now in our AST-based tools is to keep spacing as
comments too and of course we keep comments in the AST. In this way we
can do some AST transformation that may keep often the structure and the
comments in the code in a correct way for not too complex cases.
But of course, it is based on some heuristics...

The fundamental question is: why trying to use Clang instead of other
source-to-source infrastructures? Well, for the robustness of the
current Clang infrastructure!

But we have to remember that this an intractable issue in the general
case...
-- 
  Ronan KERYELL                            |\/  Phone:  +1 408 658 9453
  Wild Systems / Silkan                    |/)
  5201 Great America Parkway, Suite 320    K    Ronan.Keryell at wild-systems.com
  Santa Clara, CA 95054                    |\   skype:keryell
  USA                                      | \  http://wild-systems.com