[cfe-dev] code conversion challenge

Manuel Klimek klimek at google.com
Wed Feb 29 10:34:54 PST 2012


On Wed, Feb 29, 2012 at 9:16 AM, Philip Ashmore
<contact at philipashmore.com> wrote:
> On 29/02/12 16:29, Manuel Klimek wrote:
>> On Tue, Feb 28, 2012 at 10:12 AM, Philip Ashmore
>> <contact at philipashmore.com>  wrote:
>>> On 28/02/12 17:17, Sean Silva wrote:
>>>> It looks like what you want to do is to run a RecursiveASTVisitor over
>>>> the AST and essentially cherry-pick certain information off of it. It
>>>> may be a lot of work, but I think you could do it.
>>>>
>>>> Oh, except for the comment. That would be *much* more difficult. In
>>>> your example you seem to be including the comment as though it were a
>>>> statement in the body. How would your representation (which reminds me
>>>> of Prolog btw) represent:
>>>>
>>>>    int main(int argc, char * argv[])
>>>>    {
>>>>      return /* all ok */ 0;
>>>>    }
>>> Return(Comment("all ok"), Int(0i32))
>>> You could also add File("myfile.sbt"), Line(22) and Column(32) anywhere
>>> to track the source file.
>>> It all comes down to what you want to process and how.
>>>>
>>>> or
>>>>
>>>>    int main(int argc, char * argv[])
>>>>    {
>>>>      doSomethingWithALotOfArgs(argv[0], argv, argv+argc,
>>>> /*verbose=*/false);
>>>>      return 0;
>>>>    }
>>>>
>>>> Code like that last example is extremely common.
>>>>
>>>> For a more pathological example, consider
>>>>
>>>>    #define X(a,b) a##b
>>>>    int X(ma,/*pure evil*/in)(int argc, char * argv[])
>>>>    {
>>>>      doSomethingWithALotOfArgs(argv[0], argv, argv+argc,
>>>> /*verbose=*/false);
>>>>      return 0;
>>>>    }
>>> , Macro
>>>    ( name(X)
>>>    , Parameters(a, b)
>>>    , Body
>>>      ( Return(Concat(a, b))
>>>      )
>>>    )
>>> , Function
>>>    ( Name(X(ma, Comment("pure evil"), in))
>>>    , Body
>>>      ( Call(doSomethingWithALotOfArgs, Index(argv, 0), Add(argv, argc),
>>> Comment("verbose"), Bool(false))
>>>
>>> My parser doesn't distinguish between "built-in" symbols and those used
>>> in the code.
>>>> The conclusion is that to actually be useful, your representation
>>>> would want to make certain things "off limits" or purposefully not
>>>> representable. It's up to you to draw the line. Once you have that
>>>> line, you can then get what you want in a pretty straightforward way
>>>> from clang.
>>> I think the binary representation would be really useful as a
>>> pre-compiled header format where even macro expansion is
>>> deferred.
>>>
>>> I forgot to mention that the format is in-place-editable and with a
>>> snapshotting filesystem (e.g. fuse) you could
>>> efficiently modify it in place for one source file, make another
>>> snapshot and edit that, and then throw the snapshots
>>> away.
>>>
>>> It's going to be part of my v3c-storyboard SourceForge project, and
>>> being able to process C/C++ into this format would
>>> be a big plus.
>>>
>>> Things like extracting function prototypes, automatically determining
>>> the required include files, source translation
>>> all become a lot easier this way, as the library has a ridiculously
>>> simple C/C++ api - it's all about calls, symbols and
>>> literals.
>>
>> Having done a few real world C++ code transformations recently, I
>> don't buy that a stripped down format will help a lot. Most of the
> The format is minimal but there are no limits as to what it can represent.
> I had in mind using it from a (graphical) user interface and being able
> to drill down
> into the structure representation to essentially "draw" the required
> operation.
>
> Given the questions I see regularly on cfe-dev from users, such an
> intuitive tool could
> prove to be very popular.
>> things you propose would need very C++ specific implementations - why
>> not just write tools against the clang AST for them?
> The problem with AST is that the macros are already expanded.
> I'd like to let the user try out different macro definitions and see how
> it affects the expanded
> macro and the generated AST, interactively.

Wouldn't you need to reparse for that? In C++ things can get
significantly different meanings from a few changed tokens in a macro.

Cheers,
/Manuel

>>
>> Cheers,
>> /Manuel
>>>
>>>> --Sean Silva
>>>>
>>>> On Tue, Feb 28, 2012 at 2:21 AM, Philip Ashmore
>>>> <contact at philipashmore.com<mailto:contact at philipashmore.com>>  wrote:
>>>>
>>>>      On 28/02/12 07:01, Philip Ashmore wrote:
>>>>      >  Hi there.
>>>>      >
>>>>      >  Here's the problem:
>>>>      >  Given the source file with this content:
>>>>      >
>>>>      >       int main(int argc, char * argv[])
>>>>      >       {
>>>>      >         /* all ok */
>>>>      >         return 0;
>>>>      >       }
>>>>      >
>>>>      >  I want to convert it into something like this:
>>>>      >
>>>>      >       module
>>>>      >       ( function
>>>>      >         ( Name("main")
>>>>      >         , Returns("int")
>>>>      >         , Parameters
>>>>      >           ( Parameter(Type("int"), Name("argc")
>>>>      >           , Parameter(Type(Array(Pointer("char"), Size()),
>>>>      Name("argv")
>>>>      >           )
>>>>      >         , Body
>>>>      >           ( Comment("all ok")
>>>>      >           , Return(Int(0i32))
>>>>      >           )
>>>>      >         )
>>>>      >       )
>>>>      >
>>>>      >  This is a description format that has a binary representation
>>>>      that allows for
>>>>      >  easy depth-first and breadth-first traversal.
>>>>      >
>>>>      >  With it one can describe C/C++, make files, pre-processor macros
>>>>      etc. - the
>>>>      >  reader supplies the meaning to the "calls" like "module".
>>>>      >
>>>>      >  With it I hope to be able to describe things like interfaces and
>>>>      be able to
>>>>      >  automate the glue that allows it to be called from scripting
>>>>      languages,
>>>>      >  and much more.
>>>>      >
>>>>      >  I haven't even given this format a name, but I can convert the
>>>>      text above to
>>>>      >  and from the binary representation.
>>>>      >
>>>>      >  So that's the challenge - any takers?
>>>>      >
>>>>      >  Regards,
>>>>      >  Philip Ashmore
>>>>      >
>>>>      >  _______________________________________________
>>>>      >  cfe-dev mailing list
>>>>      >  cfe-dev at cs.uiuc.edu<mailto:cfe-dev at cs.uiuc.edu>
>>>>      >  http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>      OK, maybe not this exact example - the parameters are missing ')', but
>>>>      you get the idea.
>>>>
>>>>      Philip
>>>>      _______________________________________________
>>>>      cfe-dev mailing list
>>>>      cfe-dev at cs.uiuc.edu<mailto:cfe-dev at cs.uiuc.edu>
>>>>      http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>>>
>>>>
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at cs.uiuc.edu
>>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
> Regards,
> Philip Ashmore
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev




More information about the cfe-dev mailing list