[cfe-dev] AST Representation

Gaetano Checinski via cfe-dev cfe-dev at lists.llvm.org
Mon Feb 6 09:18:29 PST 2017


Your comment is very surprising to me.
The points you make sound to me like advantages not disadvantages:

> C++ AST can be created that is both useful and not overly specific to a
single compiler.
Maybe, but what more important is an AST you can build, analyse and
transform easily.

> The IPR does not handle macros [...]
Why should it ? - what would it be good for, besides having
source-locations ? - how is this currently handled in clang ?
 As far as i know clang's AST has no notion of macros.


I imagine it could be very handy to have multiple ASTs:
One having only nodes for Preprocessor (ppAST) constructs and one for C++
(cppAST).
The Preprocessor could process ppAST and return cppAST alongside with a
sourcemap.
We could even go a step further and have separate ASTs for C++ with and
without templates.
This would make the codebase more functional and composable.

To get a proper source-location we then just need to follow the sourcemaps.

Dealing with real (immutable) ASTs  would make (de-)serialization, cloning
and comparing an easy task.
As far as i can see, the preprocessor is a big liability:
The preprocessor is stateful and macros can transform the sourcefile in
almost any imaginable way.

As far as i can see the c++-parser could be paralyzed.
However this is only possible if the parser is decoupled from the
preprocessor.


> - does not mimic C++ language irregularities; general rules are used,
rather than long lists of special cases
We still can write an validator to be sure the AST conforms to a specific
c++-standard.
I think that having a nice and regular AST would simplify working with the
AST.
Writing visitors and pattern matchers for semantic analysis might become
easier.



> Generally, I would not trust a representation that I can't generate code
from to be correct enough for tools.
Well, my goal would definitely be to generate code from the AST. If
information are missing in the representation then we need to improve it.
IMHO the main message of the paper is that there is that you can build an
AST representation that is more regular and still complete.


> - Unfortunately, a program cannot fully automate the generation of
“skeletons.” If our aim is portability, we still need to (by hand)
eliminate non-standard additions to the contents of header file.

can you elaborate? - what do you mean by "skeletons" and  to which header
files are you referring to?

2017-02-06 15:45 GMT+00:00 Manuel Klimek <klimek at google.com>:

> I haven't looked too deeply into it, but from talking to various clang
> developers, the common theme is disbelieve that a high-level C++ AST can be
> created that is both useful and not overly specific to a single compiler.
>
> From the paper referenced in the project, I see multiple things that make
> it seem not interesting to me from a point of refactoring and semantic
> analysis:
> - The IPR does not handle macros before their expansion in the
> preprocessor.
> - does not mimic C++ language irregularities; general rules are used,
> rather than long lists of special cases
> - Unfortunately, a program cannot fully automate the generation of
> “skeletons.” If our aim is portability, we still need to (by hand)
> eliminate non-standard additions to the contents of header file.
>
> Generally, I would not trust a representation that I can't generate code
> from to be correct enough for tools.
>
> On Wed, Feb 1, 2017 at 8:32 PM Reid Kleckner via cfe-dev <
> cfe-dev at lists.llvm.org> wrote:
>
>> Yes, the clang "abstract syntax tree" is often jokingly referred to as
>> the "concrete syntax graph". We try to provide a generally useful
>> representation, but being a fast production C++ compiler comes first.
>> Clang's AST is very concrete. You have to know a lot about it to navigate
>> it. There is no "Node" base class that you can use as a cursor to navigate
>> around Decls, Exprs, Types, and TemplateArguments. Template instantiation,
>> the closest thing I can think of to cloning, is done relatively manually
>> with TreeTransform.
>>
>> I think it would be nice to revisit the design of clang's AST to simplify
>> it, normalize it, and abstract it, but it is not a task to be taken
>> lightly, and I don't expect it to happen in the near future.
>>
>> On Wed, Feb 1, 2017 at 8:44 AM, Gaetano Checinski via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>> As the AST is not really a Tree as it seems to have circular references,
>> working with the AST is sometimes a bit messy (eg. cloning).
>>
>> A while ago Stroustroup pointed me to Gabriel Dos Reis' work on a
>> different approach to represent C++-AST: https://github.com/GabrielDosR
>> eis/ipr
>> <https://mailtrack.io/trace/link/a5c184fad2cf94fbf0449fa233263072ebc34ea8?url=https%3A%2F%2Fgithub.com%2FGabrielDosReis%2Fipr&signature=9a19f2a0a41d5ceb>
>>
>> Did anyone try to integrate his work into clang or has an opinion to
>> share ?
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>
>




<https://mailtrack.io/trace/link/29fe4deb755f2a860d95181bcf80131f16c45e93?url=https%3A%2F%2Fmailtrack.io%2F&signature=d1df2a3c7469f8d7>Sent
with Mailtrack
<https://mailtrack.io/install?source=signature&lang=en&referral=gaetano.checinski@gmail.com&idSignature=22>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20170206/7923d669/attachment.html>


More information about the cfe-dev mailing list