[PATCH] D61637: [Syntax] Introduce syntax trees

Ilya Biryukov via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 8 02:35:49 PDT 2019


ilya-biryukov added inline comments.


================
Comment at: clang/include/clang/Tooling/Syntax/Corpus.h:23
+/// token buffers, source manager, etc.
+class Corpus {
+public:
----------------
sammccall wrote:
> I think plain SyntaxArena might be a better name here :-/
> Corpus refers to texts (the use in dex is by analogy, as we call symbols "documents" from search).
> 
Went with Arena


================
Comment at: clang/include/clang/Tooling/Syntax/Corpus.h:26
+  Corpus(SourceManager &SourceMgr, const LangOptions &LangOpts,
+         TokenBuffer MainFile);
+
----------------
sammccall wrote:
> MainFile is presumably the whole TU, name might need a tweak.
> Can it be empty?
> The relationship between Corpus and TokenBuffer seems a little weird. Why is it needed?
Operations on the trees sometimes need to know anything about underlying tokens - they have access to `TokenBuffer` that produced them.

More specifically, this can be used to map between spelled and expanded tokens and check the mappings are possible.


================
Comment at: clang/include/clang/Tooling/Syntax/Corpus.h:38
+  std::pair<FileID, llvm::ArrayRef<syntax::Token>>
+  tokenizeBuffer(std::unique_ptr<llvm::MemoryBuffer> Buffer);
+
----------------
sammccall wrote:
> Are you planning to have a way to add tokens directly? Having to turn them into text and re-lex them seems like it might be inconvenient.
The tokens have source locations and refer to a text in some buffer. `tokenizeBuffer` makes is easier, not harder, to mock tokens.


================
Comment at: clang/include/clang/Tooling/Syntax/Corpus.h:40
+
+  /// Construct a new syntax node of a specified kind. The memory for a node is
+  /// owned by the corpus and will be freed when the corpus is destroyed.
----------------
sammccall wrote:
> Now there's two ways to do this: `new (C.allocator()) T(...)` or `C.construct<T>(...)`. Do we need both?
> 
> (If we do, is the syntax `new (C) T(...)` more natural?)
I think `C.construct<T>()` read better than `new (C) T(...)`. Not a big fan of placement new exprs.


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:34
+
+/// Build a syntax tree for the main file.
+TranslationUnit *buildSyntaxTree(Corpus &C,
----------------
sammccall wrote:
> for a translation unit? or for only decls within the main file?
For a translation unit. We will add versions that built for a subtree of the AST later.


================
Comment at: clang/include/clang/Tooling/Syntax/Tree.h:40
+/// node.
+void traverse(Node *N, llvm::function_ref<void(Node *)> Visit);
+void traverse(const Node *N, llvm::function_ref<void(const Node *)> Visit);
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > I've been burned with adding these APIs without use cases.
> > 
> > It seems likely you want a way to:
> >  - skip traversal of children
> >  - abort the traversal entirely
> Not having an option to abort traversal protects us against timing attacks...
> 
> Agree with both, will address in this patch.
> 
Removed from the public API, we seem to have different ideas on how it should look like and I'd prefer to focus on storage model in this patch.


================
Comment at: clang/include/clang/Tooling/Syntax/Tree/Cascade.h:1
+//===- Tree.h - cascade of the syntax tree --------------------*- C++ -*-=====//
+//
----------------
sammccall wrote:
> sammccall wrote:
> > sammccall wrote:
> > > this is Cascade.h, not tree.h
> > why "cascade"?
> The Tree/ subdirectory seems superfluous - why are these separate from Syntax/?
Cascade defines a few base nodes: a composite node (`TreeNode`) and a leaf node that holds tokens.
I'd really like to isolate them from language-specific nodes, so language-specific nodes live in a separate file (`Nodes.h`).
However, they need to see the definition of a composite node, hence the split.

Users are advised to use an umbrella header, `Tree.h`. The extra directory is to minimize the number of headers in the top-level directory, having too many is confusing.


================
Comment at: clang/include/clang/Tooling/Syntax/Tree/Cascade.h:74
+/// A composite tree node that has children.
+class TreeNode : public Node {
+public:
----------------
sammccall wrote:
> This use of "tree node" to mean specifically internal node seems confusing - is it common?
I don't think it's common, can use `CompositeNode` - seems like a better alternative


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61637/new/

https://reviews.llvm.org/D61637





More information about the cfe-commits mailing list