[cfe-dev] C AST transformations / questionable use of AST serialization

Wed Jun 11 15:10:57 PDT 2014

Hi,

We've been building a tool (eventually to be released BSD) for allowing
programmers to write custom program properties (complexity, semantic,
architectural, etc.) in a high level DSL (embedded in Haskell at the moment).
We switched from a decent but ad hoc C99 parser to using the Clang front end and
are very happy customers.  We are using the libclang C interface via FFI.

However, we lost one extremely useful capability in this transition.  We had
some really nice one-liners in our pre-clang days, e.g.,

  property1 = noUnreachableCode . removeDecls (hasPrefix "test_")

    // Remove all the declarations for test code from the project then
    // test to see if there is no unreachable code

  property2 = noUnreachableCode
            . removeDecls (hasPrefix "test_")
            . removeMembersFromStructs (hasPrefix "test_")

    // Ditto, but we also remove structure members that are only there
    // for testing purposes.

If you don't grok Haskell:
  - The '.' above is function composition (like '|' in Unix)
  - removeDecls, removeMembersFromStructs, hasPrefix are higher order functions.

With our switch to clang, we have lost the ability to do quick and easy
wholesale project transformations like the above removeDecls function.  We also
have the need to do transformations that add to the code (e.g., inserting
attributes).  The output of these transformations (code slicing, mutations, extensions) may be
only be for intermediate use and are not necessarily output for the sake of code refactoring.

I'd really like to regain the ability to achieve such transformations.  As we explore
ways to do this, these are some of my thoughts:

  - These modules

       Refactoring.h - Framework for clang refactoring tools
       Rewriter.h - Code rewriting interface

    seem to be designed for applying changes to the source and cannot
    be readily used to modify the AST (nor the serialized form of the AST).

    Correct?

  - One approach I'm considering is to write a custom encoder/decoder for the
    serialized AST for our Haskell code.  I.e., porting the clang::serialization
    stuff to Haskell so that we can read and write .ast files.

    I saw some long past post to this list that discouraged this.  
    But my question is not so much whether you think (as C++ coders) this is the *preferable* way,
    but

      IF someone is really keen for a 3rd party (non C++) tool to transform the AST

       - Is the above replace-serialization approach even feasible?
       - Any warnings/suggestions if we did try this?
       - Are there alternative ways to do this that don't involve applying
         rewrites to the source and re-parsing?

Sorry for the long post.  Any insights or guidance would be very helpful!

- Mark Tullsen