[PATCH] D128411: [syntax] Introduce a BaseToken class.

Fri Jul 8 02:24:00 PDT 2022

hokein added inline comments.

================
Comment at: clang/include/clang/Tooling/Syntax/TokenManager.h:37
+
+  // FIXME: add an interface for getting token kind.
+};
----------------
sammccall wrote:
> I wouldn't want to prejudge this: this is a very basic attribute similar to kind/role, and we may want to store it in Leaf and avoid the virtual stuff.
> 
> There's certainly enough space, e.g. of the current 16-bit `kind` use the top bit to denote leaf-or-not and the bottom 15 bits to store kind-or-tokenkind.
This sounds good. Adjust the FIXME.

================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:463
+/// It tracks the underlying token buffers, source manager, etc.
+class SyntaxTokenManager : public TokenManager {
+public:
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > conceptually this is just "TokenBuffer implements TokenManager"
> > 
> > The main reason I can see not to actually write that is to avoid the dependency from TokenBuffer (tokens library) to TokenManager (syntax library). But here you've added that dependency anyway.
> > 
> > So I think we'd be better either with `TokenBuffer : TokenManager` or moving this class to its own header.
> I would argue they should be separate concepts.
> - `TokenBuffer` is about storing tokens and mapping between expanded and spelled token streams.
> - `SyntaxTokenManager` is about implementing relevant certain operations on `syntax::Token` and hiding the actual token type.
> Conceptually those things are different and decoupling them allows for more flexibility and allows reasoning about them independently. In particular, one could imagine having a `SyntaxTokenManager` without a `TokenBuffer` if we do not need to deal with two streams of tokens.
> 
> I would suggest keeping `SyntaxTokenManager` as a separate class.
> I would suggest keeping SyntaxTokenManager as a separate class.

+1. Moved to a separate file.

================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:465
+public:
+  SyntaxTokenManager(SourceManager &SourceMgr, const LangOptions &LangOpts,
+                     const TokenBuffer &Tokens)
----------------
sammccall wrote:
> no need to take SourceManager, TokenBuffer already includes it.
> LangOpts is unused here.
> no need to take SourceManager, TokenBuffer already includes it.

The SourceMgr can be mutated by the class (it stores the underlying tokens for `ExtraTokens`), while the SourceManager in TokenBuffer is immutable.

> LangOpts is unused here.
It is used in SyntaxTokenManager::lexBuffer.

================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:475
+    assert(Token);
+    // Handle 'eof' separately, calling text() on it produces an empty string.
+    if (Token->kind() == tok::eof)
----------------
sammccall wrote:
> Empty string seems like the correct return value here to me.
> If you want a special case for dump, I think that belongs in dump().
> 
> If this is because we currently provide no way to get the token kind, then this should be a FIXME
Yeah, this special case is for the Leaf node dump. Added a FIXME. 

(I even double whether we need this special case at all, do we really want to build a Leaf node for eof token?)

================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:504
+  /// IDs and storage for additional tokenized files.
+  llvm::DenseMap<FileID, std::vector<Token>> ExtraTokens;
+};
----------------
sammccall wrote:
> aren't we trying to store these on the syntax arena?
> We never do lookups into this map, maybe lexbuffer just allocates storage on the arena('s allocator) instead of using this map?
> aren't we trying to store these on the syntax arena?

We could do that, but I'd try to avoid that. Now the allocator of syntax::Arena is a storage for syntax-tree nodes only. Allocation for token-related stuff is on the `SyntaxTokenManager`.

> We never do lookups into this map, maybe lexbuffer just allocates storage on the arena('s allocator) instead of using this map?

This map is moved from the syntax::Arena. 

You're right, there is no usage of the key at the moment, and the only use-case is to create a syntax leaf node that not backed by the source code (for refactoring usecase), it is unclear whether we will use it in the future. I will keep it as-is (this is not the scope of this patch).

================
Comment at: clang/unittests/Tooling/Syntax/TreeTestBase.cpp:116
     syntax::TranslationUnit *&Root;
+    std::unique_ptr<syntax::SyntaxTokenManager> &TM;
     std::unique_ptr<syntax::TokenBuffer> &TB;
----------------
ilya-biryukov wrote:
> NIT: it´s not breaking anything now, but I suggest putting SyntaxTokenManager after TokenBuffer.
> The reason is that it´s the right destruction order, TokenManager has references to TokenBuffer, so it could potentially access it in destructor some time in the future (e.g. imagine asserting something on tokens).
> 
> Not that it actually breaks today, but seems like a potential surprising bug in the future if we happen to refactor code in a certain way. 
good point!

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128411/new/

https://reviews.llvm.org/D128411