[cfe-dev] C AST transformations / questionable use of AST serialization
Mark Tullsen
tullsen at galois.com
Wed Jun 11 15:10:57 PDT 2014
Hi,
We've been building a tool (eventually to be released BSD) for allowing
programmers to write custom program properties (complexity, semantic,
architectural, etc.) in a high level DSL (embedded in Haskell at the moment).
We switched from a decent but ad hoc C99 parser to using the Clang front end and
are very happy customers. We are using the libclang C interface via FFI.
However, we lost one extremely useful capability in this transition. We had
some really nice one-liners in our pre-clang days, e.g.,
property1 = noUnreachableCode . removeDecls (hasPrefix "test_")
// Remove all the declarations for test code from the project then
// test to see if there is no unreachable code
property2 = noUnreachableCode
. removeDecls (hasPrefix "test_")
. removeMembersFromStructs (hasPrefix "test_")
// Ditto, but we also remove structure members that are only there
// for testing purposes.
If you don't grok Haskell:
- The '.' above is function composition (like '|' in Unix)
- removeDecls, removeMembersFromStructs, hasPrefix are higher order functions.
With our switch to clang, we have lost the ability to do quick and easy
wholesale project transformations like the above removeDecls function. We also
have the need to do transformations that add to the code (e.g., inserting
attributes). The output of these transformations (code slicing, mutations, extensions) may be
only be for intermediate use and are not necessarily output for the sake of code refactoring.
I'd really like to regain the ability to achieve such transformations. As we explore
ways to do this, these are some of my thoughts:
- These modules
Refactoring.h - Framework for clang refactoring tools
Rewriter.h - Code rewriting interface
seem to be designed for applying changes to the source and cannot
be readily used to modify the AST (nor the serialized form of the AST).
Correct?
- One approach I'm considering is to write a custom encoder/decoder for the
serialized AST for our Haskell code. I.e., porting the clang::serialization
stuff to Haskell so that we can read and write .ast files.
I saw some long past post to this list that discouraged this.
But my question is not so much whether you think (as C++ coders) this is the *preferable* way,
but
IF someone is really keen for a 3rd party (non C++) tool to transform the AST
- Is the above replace-serialization approach even feasible?
- Any warnings/suggestions if we did try this?
- Are there alternative ways to do this that don't involve applying
rewrites to the source and re-parsing?
Sorry for the long post. Any insights or guidance would be very helpful!
- Mark Tullsen
More information about the cfe-dev
mailing list