[PATCH] D156277: [Parser][ObjC] Stop parsing on eof

John McCall via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Jul 27 11:03:28 PDT 2023


rjmccall added inline comments.


================
Comment at: clang/lib/Parse/ParseObjc.cpp:749
+      if (!Tok.is(tok::eof))
+        ConsumeToken();
       break;
----------------
danix800 wrote:
> rjmccall wrote:
> > aaron.ballman wrote:
> > > rjmccall wrote:
> > > > aaron.ballman wrote:
> > > > > aaron.ballman wrote:
> > > > > > danix800 wrote:
> > > > > > > tbaeder wrote:
> > > > > > > > Why is there a `ConsumeToken()` call at all here? The token is already being consumed in line 729.
> > > > > > > Didn't notice this, thanks for reminding!
> > > > > > I have the same question as @tbaeder -- what token is this intending to consume? CC @rjmccall for Obj-C expertise
> > > > > OH! This is consuming the identifier for the implementation/interface name itself. e.g.,
> > > > > ```
> > > > > @interface Frobble
> > > > > ```
> > > > > The consume on line 709 gets the `@`, the consume on line 729 gets the `interface`, and the consume on line 749 is getting the `Frobble`. That makes sense to me now.
> > > > > 
> > > > I don't think any language expertise is required here — just seems like a straightforward bug on an error path that's probably not exercised all that often.  Maybe somebody moved the `ConsumeToken` and forgot to fix this case or something.
> > > What concerns me about this fix is that we don't typically check whether the token is EOF or not before consuming; that's usually an anti-pattern, isn't it? Wouldn't it make sense for this to use `SkipUntil(tok::identifier)` instead?
> > Okay, so now I can bring a little language expertise to bear. :)
> > 
> > We're in the middle of parsing an ObjC block (e.g. `@interface`), and we see `@interface` or `@implementation`, which starts a new block.  You can never nest these ObjC blocks, so the parser is reasonably assuming that the second `@keyword` is an attempt to start a new block and the user just forgot to terminate the last block with `@end`.  Unfortunately, the actual recovery done by the parser doesn't seem to match the diagnostic and the fixit — it's trying to swallow `@interface Foo` (or whatever) and then continue the loop as if it were part of the current block, which is definitely not the right thing to do.
> > 
> > The right way to recover here is to act like we actually saw `@end` and break out of the loop, leaving `Tok` on the `@` so that the parser will pick up parsing `@interface` normally after we return.  To do that, we just need to get the ObjC keyword by peeking at the next token instead of consuming.
> > 
> > Also, we should take this recovery path on every `@` keyword that's only allowed at the top level (so `@class`, `@compatibility_alias`, `@interface`, `@implementation`, and `@protocol`).
> It's really great to learn things here! I don't know two much about ObjC. I seached google trying to find some standard or specs for ObjC but only docs like tutorials teaching how to use it can be found, so I might not be able to give a good enough fix for this issue. I'll give it a try though.
You should be able to follow the guidance here without needing to know much more about ObjC, just understanding how the parser works.  The key is that you need to delay consuming tokens until you're certain you're going to commit to parsing this `@`-directive as part of the current declaration.

Start with the line `SourceLocation AtLoc = ConsumeToken();`  Instead of consuming the `@` and then looking at `Tok` to see what the keyword is, you can get the location of `Tok` without consuming it, then use `NextToken()` to peek ahead to the next token to see the keyword.

Be sure to sink the right number of `ConsumeToken` calls down onto all of the paths that *aren't* bailing out.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D156277/new/

https://reviews.llvm.org/D156277



More information about the cfe-commits mailing list