[PATCH] D128411: [syntax] Introduce a BaseToken class.

Fri Jun 24 12:28:20 PDT 2022

hokein added inline comments.

================
Comment at: clang/include/clang/Tooling/Syntax/TokenManager.h:33
+  /// The syntax-tree Leaf node holds a Key.
+  using Key = const void *;
+  /// Gets the text of token identified by the key.
----------------
ilya-biryukov wrote:
> I have just realized that we were discussing having opaque index as a key, but there may also be tokens in the syntax tree that are not backed by the `TokenBuffer`.
> Those can be synthesized, e.g. imagine someone wants to change `!(A && B)` to `!A || !B`. They will need to synthesize at least the `||` token as it's not in the source code. There is a way to do this now and it prohibits the use of an index to the `TokenBuffer`.
> 
> So having the opaque pointer is probably ok for now, it should enable the pseudo-parser to build syntax trees.
> We might want to add an operation to synthesize tokens into the `TokenManager` at some point, but that's a discussion for another day.

> Those can be synthesized, e.g. imagine someone wants to change !(A && B) to !A || !B. They will need to synthesize at least the || token as it's not in the source code. There is a way to do this now and it prohibits the use of an index to the TokenBuffer.

Yes, this is the exact reason why the Key is an opaque pointer, my first attempt was to use an index integer, but failed -- we already have some APIs doing this stuff (see `createLeaf` in BuildTree.h), the token can be a synthesized token backed up by the SourceManager...

Personally, I don't like the Key to be an opaque pointer as well, but considering the effort, it seems to be the best approach so far -- it enables the pseudoparser to build syntax trees with a different Token implementation while keeping the rest syntax stuff unchanged.

> We might want to add an operation to synthesize tokens into the TokenManager at some point, but that's a discussion for another day.

Agree, we will encounter this in the future, but we're still far away from there (the layering mutation/syntax-tree is not perfect at the moment, mutation still depends on the TokenBuffer). And our initial application of syntax-tree in pseudoparser focuses on the read use-case, we should be fine now.

================
Comment at: clang/lib/Tooling/Syntax/Tree.cpp:271
+      // FIXME: can be fixed by adding an tok::Kind in the Leaf node.
+      // assert(cast<Leaf>(C).getToken()->kind() == L->getDelimiterTokenKind());
     }
----------------
ilya-biryukov wrote:
> Maybe add `TokenManager::getKind(Key)` right away and remove this FIXME.
> This should as simple as `cast<syntax::Token>(T)->Kind`, right? Or am I missing some complications?
Yeah, the main problem is that we don't have the `TokenManager` object in the `syntax::Node`, we somehow need to pass it (e.g. a function parameter), which seems a heavy change. I'm not sure it is worth for this small assert.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128411/new/

https://reviews.llvm.org/D128411