[cfe-dev] AST Representation

Manuel Klimek via cfe-dev cfe-dev at lists.llvm.org
Mon Feb 6 09:30:35 PST 2017


On Mon, Feb 6, 2017 at 6:18 PM Gaetano Checinski <
gaetano.checinski at gmail.com> wrote:

> Your comment is very surprising to me.
> The points you make sound to me like advantages not disadvantages:
>
> > C++ AST can be created that is both useful and not overly specific to a
> single compiler.
> Maybe, but what more important is an AST you can build, analyse and
> transform easily.
>

Well, "useful" seems like the most important part to me :)


> > The IPR does not handle macros [...]
> Why should it ? - what would it be good for, besides having
> source-locations ? - how is this currently handled in clang ?
>  As far as i know clang's AST has no notion of macros.
>

Having full information about what source locations each part of the AST
was generated from would be enough; that's what clang provides.
Transformations for C++ code need to take that into account, though.


> I imagine it could be very handy to have multiple ASTs:
> One having only nodes for Preprocessor (ppAST) constructs and one for C++
> (cppAST).
> The Preprocessor could process ppAST and return cppAST alongside with a
> sourcemap.
> We could even go a step further and have separate ASTs for C++ with and
> without templates.
> This would make the codebase more functional and composable.
>
> To get a proper source-location we then just need to follow the sourcemaps.
>
> Dealing with real (immutable) ASTs  would make (de-)serialization, cloning
> and comparing an easy task.
> As far as i can see, the preprocessor is a big liability:
> The preprocessor is stateful and macros can transform the sourcefile in
> almost any imaginable way.
>
> As far as i can see the c++-parser could be paralyzed.
> However this is only possible if the parser is decoupled from the
> preprocessor.
>

Preprocessing is fast; parallelization is made hard by C++ (the language),
not by the preprocessor.


> > - does not mimic C++ language irregularities; general rules are used,
> rather than long lists of special cases
> We still can write an validator to be sure the AST conforms to a specific
> c++-standard.
> I think that having a nice and regular AST would simplify working with the
> AST.
> Writing visitors and pattern matchers for semantic analysis might become
> easier.
>

Yes, the question is what you do when you want to match some arcane detail
of the language, and the information is not there?


> > Generally, I would not trust a representation that I can't generate
> code from to be correct enough for tools.
> Well, my goal would definitely be to generate code from the AST. If
> information are missing in the representation then we need to improve it.
> IMHO the main message of the paper is that there is that you can build an
> AST representation that is more regular and still complete.
>

So, if somebody builds it and maintains it that's great (I'd find that very
cool), but it sound like a rather large project :)


> > - Unfortunately, a program cannot fully automate the generation of
> “skeletons.” If our aim is portability, we still need to (by hand)
> eliminate non-standard additions to the contents of header file.
>
> can you elaborate? - what do you mean by "skeletons" and  to which header
> files are you referring to?
>

It means that the authors admit that their representation cannot be
generated for system headers, so manual work to model system headers is
necessary. This is problematic at scale, unless I'm misunderstanding
something.

Don't get me wrong, I'd love it if it was possible to create a simplified
AST for C++ that's still useful (-> efficient to generate, memory
efficient, has all necessary information).
Somebody has to prove that it's possible by writing it, I guess. Time will
tell. Given the information I have today, I'm not holding my breath, and it
seems like there are higher priority things to spend time on.


> 2017-02-06 15:45 GMT+00:00 Manuel Klimek <klimek at google.com>:
>
> I haven't looked too deeply into it, but from talking to various clang
> developers, the common theme is disbelieve that a high-level C++ AST can be
> created that is both useful and not overly specific to a single compiler.
>
> From the paper referenced in the project, I see multiple things that make
> it seem not interesting to me from a point of refactoring and semantic
> analysis:
> - The IPR does not handle macros before their expansion in the
> preprocessor.
> - does not mimic C++ language irregularities; general rules are used,
> rather than long lists of special cases
> - Unfortunately, a program cannot fully automate the generation of
> “skeletons.” If our aim is portability, we still need to (by hand)
> eliminate non-standard additions to the contents of header file.
>
> Generally, I would not trust a representation that I can't generate code
> from to be correct enough for tools.
>
> On Wed, Feb 1, 2017 at 8:32 PM Reid Kleckner via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> Yes, the clang "abstract syntax tree" is often jokingly referred to as the
> "concrete syntax graph". We try to provide a generally useful
> representation, but being a fast production C++ compiler comes first.
> Clang's AST is very concrete. You have to know a lot about it to navigate
> it. There is no "Node" base class that you can use as a cursor to navigate
> around Decls, Exprs, Types, and TemplateArguments. Template instantiation,
> the closest thing I can think of to cloning, is done relatively manually
> with TreeTransform.
>
> I think it would be nice to revisit the design of clang's AST to simplify
> it, normalize it, and abstract it, but it is not a task to be taken
> lightly, and I don't expect it to happen in the near future.
>
> On Wed, Feb 1, 2017 at 8:44 AM, Gaetano Checinski via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
> As the AST is not really a Tree as it seems to have circular references,
> working with the AST is sometimes a bit messy (eg. cloning).
>
> A while ago Stroustroup pointed me to Gabriel Dos Reis' work on a
> different approach to represent C++-AST:
> https://github.com/GabrielDosReis/ipr
> <https://mailtrack.io/trace/link/a5c184fad2cf94fbf0449fa233263072ebc34ea8?url=https%3A%2F%2Fgithub.com%2FGabrielDosReis%2Fipr&signature=9a19f2a0a41d5ceb>
>
> Did anyone try to integrate his work into clang or has an opinion to share
> ?
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
>
>
>
>
> <https://mailtrack.io/trace/link/29fe4deb755f2a860d95181bcf80131f16c45e93?url=https%3A%2F%2Fmailtrack.io%2F&signature=d1df2a3c7469f8d7>Sent
> with Mailtrack
> <https://mailtrack.io/install?source=signature&lang=en&referral=gaetano.checinski@gmail.com&idSignature=22>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170206/91a9044c/attachment.html>


More information about the cfe-dev mailing list