[cfe-dev] "Fixes" for two crashes, rant on Tok.getIdentifierInfo() and two more bugs

Chris Lattner clattner at apple.com
Thu Dec 27 12:11:18 PST 2007

On Dec 26, 2007, at 10:44 AM, Nico Weber wrote:
> the crash I reported and fixed earlier ( http://lists.cs.uiuc.edu/pipermail/cfe-dev/2007-December/000745.html 
>  ) happened because `Tok.getIdentifierInfo()` sometimes returns 0.

Right. Tokens that are not "pp-identifiers" in the lexer do not have  
an identifier pointer.  This includes tokens like numbers (1), strings  
("foo"), etc.

> These conditions are not clearly documented in Token.h, and even if  
> it was documented functions that may or may not return 0 are  
> generally error prone. So I grepped clang for calls to  
> `getIdentifierInfo()`. I found two places where this function was  
> not handled correctly. Tests to reproduce the crashes and makeshift  
> patches are attached (Someone familiar with the code needs to look  
> at the FIXMEs in the patch. Problems where related to ObjC's @try/ 
> @catch and ObjC2 @interface prefixes).

Nice!  Your patch looks exactly right, I applied it here (after  
tweaking the expected-error stuff):

> (Why is it a good idea to treat stuff like @try as two tokens  
> instead of one?)

The answer is that thing like @ /*comment*/ try   are legal, sadly  
enough.  However, it seems that we could probably do something in the  
lexer (when it sees the "@", to handle this.  I'll see what I can do  
about this when I have time.

> Furthermore, I'd suggest to at least use an assert if you know that  
> `getIdentifierInfo()` can't return 0 and rely on it. Doing an  
> `assert(Tok.getIdentifierInfo() && "foo always has ident info")`  
> serves as good documentation.

Well, in theory the code should only call and deference  
getIdentifierInfo if it already knows.  If it isn't clear from the  
context of the call in the code, adding an assert makes sense.

> In the following places it was not immediately clear to me why the  
> code is valid and `getIdentifierInfo` can't possibly return 0 (line  
> numbers relative to rev 45360):
> Lex/MacroExpander.cpp:
> line 324

This is safe because previous code verified that the macro arguments  
are identifiers.

#define A(1)

should be rejected earlier.  Adding an assert would make sense.

> Lex/Preprocessor.cpp:
> 2222
> 2253
> 2329

The calls to ReadMacroName verify that the name is an identifier.

> Parse/ParseDecl.cpp:
> 101

I'm not sure about this.  That call is only reachable if  
"Tok.is(tok::identifier) || isDeclarationSpecifier()". It is unclear  
to me that all declspecs have identifiers.  Steve?

> 1467

   assert(Tok.is(tok::kw_typeof) && "Not a typeof specifier");
   const IdentifierInfo *BuiltinII = Tok.getIdentifierInfo();

The assertion verifies that the token is a keyword, which has an  
identifier ptr.  This code is trying to preserve __typeof__ vs typeof  
in a diagnostic.

> Parse/ParseExpr.cpp:
> 216

ParseExpressionWithLeadingIdentifier is only called with an identifier  
as IdTok.
> 247

likewise for ParseAssignmentExprWithLeadingIdentifier.

> (785)

This is only called with these 4 keywords as the current token:
   case tok::kw___builtin_va_arg:
   case tok::kw___builtin_offsetof:
   case tok::kw___builtin_choose_expr:
   case tok::kw___builtin_types_compatible_p:

> Parse/Parser.cpp:
> 377 (one of the bugs, fixed with the patch


> Parse/ParseObjc.cpp:
> 304
> 325 (but only because of strange identation because of tabs instead  
> of spaces -- fixed in the attached patch as well)
> (476)
> 1130 (one of the bugs, fixed with the patch)
> 1164 (one of the bugs, fixed with the patch)
> 1235
> Even better than adding asserts in these lines is to catch this  
> problem with the compiler (for example, by putting  
> `getIdentifierInfo()` in a subclass and never let it return 0. Then  
> you _have_ to check for the right token type to call the method),  
> but that's a bit of work :-P

This would also require the Token class to be polymorphic, which is a  
non-starter.  Another potential solution would be to make  
getIdentifierInfo() always assert that the pointer is non-null.  This  
would require callers to call Tok.hasIdentifierInfo() if they don't  
know it is valid or to add a getIdentifierInfoOrNull() method.

> An unrelated crash that I found on the way is:
>    int main()
>    {
>      id a;
>      [a bla:0 6:7];
>    }
> (crashes somewhere in sema, something like this should be put in  
> test/Parse/objc-messaging-1.m)
> And here's an inconsistency with gcc:
>    int @interface bla ;  // ?? this is valid objc?
>    @end
> I have no idea what this code is supposed to do, but it doesn't warn  
> with clang but doesn't even compile with gcc.

I'll let Steve and Fariborz chime in on these.

> ps: I also converted a few tabs to spaces

Thanks!  It would make it easier to review the patch if you kept the  
mechanical pieces separate from the changes that require review, but I  
appreciate the patch.

As an aside, things will probably pick up in early january, many  
people are out for the holidays.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20071227/e07ac3b3/attachment.html>

More information about the cfe-dev mailing list