[PATCH] D59887: [Syntax] Introduce TokenBuffer, start clangToolingSyntax library

Thu May 9 09:53:32 PDT 2019

ilya-biryukov added inline comments.

================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:130
+      OS << llvm::formatv(
+          "    ['{0}'_{1}, '{2}'_{3}) => ['{4}'_{5}, '{6}'_{7})\n",
+          PrintToken(File.SpelledTokens[M.BeginSpelled]), M.BeginSpelled,
----------------
ilya-biryukov wrote:
> sammccall wrote:
> > sammccall wrote:
> > > As predicted :-) I think these `_<index>` suffixes are a maintenance hazard.
> > > 
> > > In practice, the assertions are likely to add them by copy-pasting the test output.
> > > 
> > > They guard against a class of error that doesn't seem very likely, and in practice they don't even really prevent it (because of the copy/paste issue).
> > > 
> > > I'd suggest just dropping them, and living with the test assertions being slightly ambiguous.
> > > 
> > > Failing that, some slightly trickier formatting could give more context:
> > > 
> > > `A B C D E F` --> `A B  ... E F`
> > > 
> > > With special cases for smaller numbers of tokens. I don't like the irregularity of that approach though.
> > (this one is still open)
> Will make sure to do something about it before submitting
As mentioned in the offline conversations, we both agree that we don't want to have ambiguous test-cases.
The proposed alternative was putting the counters of tokens **of the same kind** in the same token stream, with a goal of making updating test-cases simpler (now one would need to update only indicies of the tokens they changed). 

After playing around with the idea, I think this complicates the dumping function too much. The easiest way to update the test is to run it and copy-paste the expected output.
So I'd keep as is and avoiding the extra complexity if that's ok

================
Comment at: clang/lib/Tooling/Syntax/Tokens.cpp:226
+  TokenBuffer B(SourceMgr);
+  for (unsigned I = 0; I < Expanded.size(); ++I) {
+    auto FID =
----------------
sammccall wrote:
> Is there a reason we do this at the end instead of as tokens are collected?
No strong reason, just wanted to keep the preprocessor callback minimal

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59887/new/

https://reviews.llvm.org/D59887