[PATCH] D59885: [Lex] Allow to consume tokens while preprocessing

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed May 8 11:45:34 PDT 2019


rsmith added inline comments.


================
Comment at: clang/lib/Lex/Preprocessor.cpp:870-900
+  TokenSource Source;
   do {
+    Source = TokenSource();
+
     switch (CurLexerKind) {
     case CLK_Lexer:
+      Source.InDirective = CurLexer->ParsingPreprocessorDirective;
----------------
ilya-biryukov wrote:
> rsmith wrote:
> > This is a lot of extra stuff to be doing in the main `Lex` loop. Adding one (perfectly-predicted) branch on `OnToken` seems like it should be fine, but this seems like a bit more added complexity than I'd prefer. I'd like some performance measurements of `-cc1 -Eonly` to see if this makes any real difference.
> Happy to do the measurements. Do you have a good collection of input files for this in mind?
I think any large header should be fine for testing; I don't expect this to scale worse in the presence of lots of macro expansion or anything like that.


================
Comment at: clang/lib/Lex/Preprocessor.cpp:896
     case CLK_LexAfterModuleImport:
-      ReturnedToken = LexAfterModuleImport(Result);
+      Source.InDirective = true;
+
----------------
ilya-biryukov wrote:
> rsmith wrote:
> > This isn't a directive; these are normal tokens.
> Sorry if it's a silly question, just wanted to clarify I'm not missing anything here.
> 
> These tokens are the name of the module,  "import-suffix" and the semicolon that follows it?
> And they end up being used to build the AST for `ImportDecl`?
Right. LexAfterModuleImport is a pass-through lexing layer that takes special action on the ObjC `@import module.name;` and C++20 `import "header";` declarations, which are just normal phase 4 tokens.


================
Comment at: clang/lib/Lex/Preprocessor.cpp:956-957
   --LexLevel;
+  if (OnToken)
+    OnToken(Result, Source);
 }
----------------
ilya-biryukov wrote:
> ilya-biryukov wrote:
> > ilya-biryukov wrote:
> > > rsmith wrote:
> > > > This seems like it's going to be extremely hard to use. If you just want the expanded token stream, then removing all the other calls to `OnToken` and only calling it here when `LexLevel == 0` will give you that.
> > > I've tried `LexLevel == 0 && IsNewToken` and it **almost** works.
> > > The only trouble left is dealing with the delayed parsing. E.g. I'm getting the callbacks for the same tokens multiple times in cases like parsing of method bodies.
> > > 
> > > Any ideas on how to best tackle this?
> > > 
> > > The example is:
> > > ```
> > > struct X {
> > >   int method() { return 10; }
> > >   int a = 10;
> > > };
> > > ```
> > > 
> > > Seeing multiple callbacks for `{ return 10; }` and `= 10;`, want to see only the first one.
> > It almost works now, see the update revision.
> > Almost no tokens get reported by the callback twice now, except the following case.
> > Input:
> > ```
> > struct X {
> >   int method() { return method(); }
> > };
> > ```
> > After the body of the class, we get a an extra callback for `)` token from one of the functions under `ParsedLexedMethodDef`.
> > This seems to be happening only for parenthesis.
> More precisely, this happens from a following sequence of calls:
> 
> ```
> Lex() {LexLevel=0}
> CachingLex() {LexLevel=1} // which does not have a token in `CachedTokens`, so it recurses into ...
> Lex() {LexLevel=1}        // which results in a call to ...
> CurTokenLexer->Lex()
> ```
> 
> At this point `CurTokenLexer` returns a token of a pre-lexed method definition, but the current implementation forgets that this was the case by the time the token is processed at the the top of the callstack (`Lex() {LexLevel=0}`).
> So the token ends up being reported by a callback.
Yeah, the parser messes with the token stream to support C++'s out-of-order parsing rule (among other things). Ignoring that here seems right. You might also want to make sure that calls to `EnterToken` are handled properly. The extra paren likely comes from such a call.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D59885/new/

https://reviews.llvm.org/D59885





More information about the cfe-commits mailing list