[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

Tue May 7 07:45:39 PDT 2019

sammccall added inline comments.

================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:216
+                    llvm::dbgs()
+                        << "$[collect-tokens]: "
+                        << syntax::Token(T).dumpForTests(SourceMgr) << "\n"
----------------
what's "$[collect-tokens]"?
Maybe just "Token: "?

================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:224
+
+TokenBuffer TokenCollector::consume() && {
+  TokenBuffer B(SourceMgr);
----------------
this function is now >100 lines long. Can we split it up?

================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:225
+TokenBuffer TokenCollector::consume() && {
+  TokenBuffer B(SourceMgr);
+  for (unsigned I = 0; I < Expanded.size(); ++I) {
----------------
this could be a member, which would help splitting up

================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:226
+  TokenBuffer B(SourceMgr);
+  for (unsigned I = 0; I < Expanded.size(); ++I) {
+    auto FID =
----------------
Is there a reason we do this at the end instead of as tokens are collected?

================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:243
+  llvm::DenseMap<FileID, unsigned> NextSpelled;
+  auto ConsumeSpelledUntil = [&](TokenBuffer::MarkedFile &File,
+                                 SourceLocation L) {
----------------
consumespelleduntil and fillgapuntil could be methods, I think

================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:277
+  for (unsigned I = 0; I < Expanded.size() - 1; ++I) {
+    auto L = Expanded[I].location();
+    if (L.isFileID()) {
----------------
the body here could be a method too, I think
i.e. for each (expanded token), process it?

================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:322
+
+  // Some files might have unaccounted spelled tokens at the end.
+  for (auto &F : B.Files) {
----------------
and similarly this section

================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:130
+      OS << llvm::formatv(
+          "    ['{0}'_{1}, '{2}'_{3}) => ['{4}'_{5}, '{6}'_{7})\n",
+          PrintToken(File.SpelledTokens[M.BeginSpelled]), M.BeginSpelled,
----------------
sammccall wrote:
> As predicted :-) I think these `_<index>` suffixes are a maintenance hazard.
> 
> In practice, the assertions are likely to add them by copy-pasting the test output.
> 
> They guard against a class of error that doesn't seem very likely, and in practice they don't even really prevent it (because of the copy/paste issue).
> 
> I'd suggest just dropping them, and living with the test assertions being slightly ambiguous.
> 
> Failing that, some slightly trickier formatting could give more context:
> 
> `A B C D E F` --> `A B  ... E F`
> 
> With special cases for smaller numbers of tokens. I don't like the irregularity of that approach though.
(this one is still open)

================
Comment at: clang/unittests/Tooling/Syntax/TokensTest.cpp:622
+
+// FIXME: add tests for mapping spelled tokens into offsets.
+
----------------
sammccall wrote:
> please fix :-)
(still missing?)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887