[cfe-dev] AST transformations

Alek Paunov alex at declera.com
Sun Mar 13 11:49:11 PDT 2011


Hi Vassil,

On 12.03.2011 18:41, Vassil Vassilev wrote:
> On 12.3.2011 г. 19:14, Siegfried Rohdewald wrote:
>> Vassil Vassilev<vasil.georgiev.vasilev at ...>   writes:
>>
>>>> Just a question in that direction: I am thinking about .ast ->   DB ->
>>>> Transformations ->   DB ->   .ast
>>>>
>>>> Is it possible/reasonable idea ?
>>> Sorry for the stupid question but, what does DB stands for?
>> Database. That lets you use a database schema for the AST.
> I would have guess that but it seemed a bit strange to import the ast in
> a database...
> I still don't understand. It is possible but the question is what would
> be the advantages of that? I guess you want to use a database schema for
> cascade deletion...

As .ast I mean BC encoded serialization produced from the ASTWriter 
(clang -emit-ast).

In the past ten years, there are several projects following this (AST in 
DB) approach - two successful samples:

  * JTransformer [1] - open source, Eclipse plugin, based on SWI Prolog 
(the DB is a standard prolog fact store + indexes)
  * SemmleCode/.QL [2] - closed source, complies to SQL

For the proof of concept attempt, my proposal for DB/Query Language 
would be Berkeley DBXML/XQuery because:
  * XQuery, naturally operates on subtrees
  * Further, I think that some more specialized language (like TXL or 
Stratego) can be compiled to XQuery.
  * In this JunGL paper [3], the author states that his language is near 
(in terms of necessary characteristics) to XQuery.
  * XQuery is W3C Standard, there are many implementations, I personally 
think that for very large code bases, the right engine will be something 
based on Pathfinder [4]

As Siegfried said, the DB schema (XML schema/Relax NG in XMLDB case) can 
help for validation of DB import and/or the state of the trees after 
some transformation processing - this comes out-of-the box, but I am 
afraid that for full/sound validation, we will need to write additional 
modules in XQuery (because of need of semantic checks for refs between 
the nodes at least)

> And how you would do the transformations in the database? Can you give
> us more details on what you want to do?

I see two forms of transformations:
  * In-place, using XQuery Update
  * Projections using (often recursive) "constructor" functions: insert 
nodes your-module:ProjFunc1($base-node, $args) into $node)

Advantages:
  * Stratego (or even "low level" :-) XQuery) transformation can be 
sketched from almost everyone in several hours - equivalent (let's say 
final) TreeTransform based one will cost at least days for well trained 
in LLVM/Clang developer.
  * Relatively easy (customized) unparsing and other query based DB 
results, like stable C++ XML representation [*] for example.

What you think?

Kind regards,
Alek

[*] Douglas Gregor often says that the XML export of CLang AST need to 
be in standardized (to stable, not so parallel to current Clang ASTs) 
schema. I think that this is perfect goal, but can be achieved and 
supported more easy via XML->XML transformation of native Clang X.Y 
schema (using XSLT or XQuery), compared to C tree -> XML transformation 
mixed in phase of XML printing (using C++).

[1] 
http://sewiki.iai.uni-bonn.de/research/jtransformer/api/java/pefs/2.9/java_pef_overview
[2] http://en.wikipedia.org/wiki/SemmleCode
[3] 
http://research.microsoft.com/pubs/79030/DPhil%20Thesis%20-%20Mathieu%20Verbaere.pdf
[4] http://www-db.informatik.uni-tuebingen.de/research/pathfinder



More information about the cfe-dev mailing list