[PATCH] D128411: [syntax] Introduce a BaseToken class.
Haojian Wu via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Fri Jul 8 02:24:00 PDT 2022
hokein added inline comments.
================
Comment at: clang/include/clang/Tooling/Syntax/TokenManager.h:37
+
+ // FIXME: add an interface for getting token kind.
+};
----------------
sammccall wrote:
> I wouldn't want to prejudge this: this is a very basic attribute similar to kind/role, and we may want to store it in Leaf and avoid the virtual stuff.
>
> There's certainly enough space, e.g. of the current 16-bit `kind` use the top bit to denote leaf-or-not and the bottom 15 bits to store kind-or-tokenkind.
This sounds good. Adjust the FIXME.
================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:463
+/// It tracks the underlying token buffers, source manager, etc.
+class SyntaxTokenManager : public TokenManager {
+public:
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > conceptually this is just "TokenBuffer implements TokenManager"
> >
> > The main reason I can see not to actually write that is to avoid the dependency from TokenBuffer (tokens library) to TokenManager (syntax library). But here you've added that dependency anyway.
> >
> > So I think we'd be better either with `TokenBuffer : TokenManager` or moving this class to its own header.
> I would argue they should be separate concepts.
> - `TokenBuffer` is about storing tokens and mapping between expanded and spelled token streams.
> - `SyntaxTokenManager` is about implementing relevant certain operations on `syntax::Token` and hiding the actual token type.
> Conceptually those things are different and decoupling them allows for more flexibility and allows reasoning about them independently. In particular, one could imagine having a `SyntaxTokenManager` without a `TokenBuffer` if we do not need to deal with two streams of tokens.
>
> I would suggest keeping `SyntaxTokenManager` as a separate class.
> I would suggest keeping SyntaxTokenManager as a separate class.
+1. Moved to a separate file.
================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:465
+public:
+ SyntaxTokenManager(SourceManager &SourceMgr, const LangOptions &LangOpts,
+ const TokenBuffer &Tokens)
----------------
sammccall wrote:
> no need to take SourceManager, TokenBuffer already includes it.
> LangOpts is unused here.
> no need to take SourceManager, TokenBuffer already includes it.
The SourceMgr can be mutated by the class (it stores the underlying tokens for `ExtraTokens`), while the SourceManager in TokenBuffer is immutable.
> LangOpts is unused here.
It is used in SyntaxTokenManager::lexBuffer.
================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:475
+ assert(Token);
+ // Handle 'eof' separately, calling text() on it produces an empty string.
+ if (Token->kind() == tok::eof)
----------------
sammccall wrote:
> Empty string seems like the correct return value here to me.
> If you want a special case for dump, I think that belongs in dump().
>
> If this is because we currently provide no way to get the token kind, then this should be a FIXME
Yeah, this special case is for the Leaf node dump. Added a FIXME.
(I even double whether we need this special case at all, do we really want to build a Leaf node for eof token?)
================
Comment at: clang/include/clang/Tooling/Syntax/Tokens.h:504
+ /// IDs and storage for additional tokenized files.
+ llvm::DenseMap<FileID, std::vector<Token>> ExtraTokens;
+};
----------------
sammccall wrote:
> aren't we trying to store these on the syntax arena?
> We never do lookups into this map, maybe lexbuffer just allocates storage on the arena('s allocator) instead of using this map?
> aren't we trying to store these on the syntax arena?
We could do that, but I'd try to avoid that. Now the allocator of syntax::Arena is a storage for syntax-tree nodes only. Allocation for token-related stuff is on the `SyntaxTokenManager`.
> We never do lookups into this map, maybe lexbuffer just allocates storage on the arena('s allocator) instead of using this map?
This map is moved from the syntax::Arena.
You're right, there is no usage of the key at the moment, and the only use-case is to create a syntax leaf node that not backed by the source code (for refactoring usecase), it is unclear whether we will use it in the future. I will keep it as-is (this is not the scope of this patch).
================
Comment at: clang/unittests/Tooling/Syntax/TreeTestBase.cpp:116
syntax::TranslationUnit *&Root;
+ std::unique_ptr<syntax::SyntaxTokenManager> &TM;
std::unique_ptr<syntax::TokenBuffer> &TB;
----------------
ilya-biryukov wrote:
> NIT: it´s not breaking anything now, but I suggest putting SyntaxTokenManager after TokenBuffer.
> The reason is that it´s the right destruction order, TokenManager has references to TokenBuffer, so it could potentially access it in destructor some time in the future (e.g. imagine asserting something on tokens).
>
> Not that it actually breaks today, but seems like a potential surprising bug in the future if we happen to refactor code in a certain way.
good point!
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D128411/new/
https://reviews.llvm.org/D128411
More information about the cfe-commits
mailing list