[PATCH] D61637: [Syntax] Introduce syntax trees

Ilya Biryukov via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Fri Jun 21 05:08:52 PDT 2019


ilya-biryukov marked 13 inline comments as done.
ilya-biryukov added a comment.

This is ready for another round now.



================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:37
+  /// and \p Last are added as children of the new node.
+  void learnNode(SourceLocation Fist, SourceLocation Last,
+                 syntax::TreeNode *New);
----------------
sammccall wrote:
> sammccall wrote:
> > maybe formNode, formToken, formRoot()?
> if syntax nodes strictly nest and we form left-to-right and bottom up, then why are there ever pending nodes that aren't in the range? Is it because we don't aggregate them as early as possible?
> 
> (edit: after offline discussion, there are precise invariants here that could be documented and asserted)
Sorry, missed this comment and went with `expectToken()` and `expectNode()`, root is now built on `consume()`. Can change to `form*`, not a big deal.

I still need to write good docs about the invariants here, so leaving this open.



================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:48
+  /// Consume the root node.
+  syntax::TranslationUnit *root() && {
+    assert(Root);
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > any particular reason learnRoot() and root() are different functions?
> > 
> > If learnRoot() returned TranslationUnit*, then we avoid the need for the caller to know about the dependency, it would track the state itself.
> `learnRoot` is called inside ast visitor when processing `TranslationUnitDecl` and `root()` is used to consume the result.
> 
> I guess we could just delay `learnRoot` until `consume()` is called, shouldn't be a big deal. I'll do this in the next iteration.
`consume()` now builds the root node.


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:92
+  bool TraverseCompoundStmt(CompoundStmt *S) {
+    Builder.learnTokenNode(S->getLBracLoc(), tok::l_brace);
+    Builder.learnTokenNode(S->getRBracLoc(), tok::r_brace);
----------------
sammccall wrote:
> So explicitly claiming the primitive tokens, but implicitly claiming the subtrees, seems like a weird mix.
> Having both explicit might be nicer:
>  - It seems somewhat likely we want to record/tag their semantics (maybe in the child itself, or even the low bits of its pointer?), rather than having accessors scan around looking for something likely.
>  - currently when expected subtrees fail to parse, their tokens get (implicitly) wrapped up in Recovery nodes. They're good targets for heuristic parsing, but this probably means we should record what an unexplained range of tokens is supposed to be for.
> 
> Thinking of something like:
> ```
> builder.expectToken(l_brace, S->getLBracLoc());
> builder.expectTree(Statement, S->getBody());
> builder.expectToken(r_brace, S->getRBracLoc());
> ```
> where the builder would react to non-null AST nodes by mapping the associated syntax node, and null AST nodes by trying to heuristically parse the tokens in between LBracLoc and RBracLoc.
> 
> But lots of unanswered questions here: body is a list of statements, how does that work? What if LBracLoc or RBracLoc is missing? etc.
That's exactly the design we have one, with a limitation that `expect*` have to be called in left-to-right and bottom-up manner. Also, you can only build a tree that ends at the tokens that were consumed.

This is actually a reasonable interface to build a tree from an actual parser, but might feel weird for a ast-to-syntax transformation. I need to figure out a way to write good docs about it, but there's a separate comment for that. Marking this as done (although the questions you mentioned at the end are still there)


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:135
+
+void syntax::TreeBuilder::learnNodeImpl(const syntax::Token *Begin,
+                                        const syntax::Token *End,
----------------
sammccall wrote:
> this function needs some high-level implementation comments
Done. Also renamed it to `consumeNode`. 

The docs are very short, though, might need a revamp for clarity.


================
Comment at: clang/lib/Tooling/Syntax/BuildTree.cpp:164
+  };
+  for (auto It = FirstChild; It != NodesInProgress.end(); ++It) {
+    // Add non-coverred ranges as recovery nodes.
----------------
sammccall wrote:
> why can It not point to a node that spans/is past End?
> 
> (edit after offline discussion: it's an important invariant that we're always consuming a suffix of the pending nodes)
This is now spelled out in the documentation for `foldChildren`.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61637/new/

https://reviews.llvm.org/D61637





More information about the llvm-commits mailing list