[cfe-dev] Clang 3.2 assertion failure reading AST files: TokenID != tok::identifier && "Already at tok::identifier

Tom Honermann thonermann at coverity.com
Tue Feb 26 14:15:45 PST 2013


The following code causes Clang (3.2 on Linux) to fail an assertion test 
when deserializing an AST from a PCH file.  Note that the identifier 
(__is_void) for the struct matches a Clang keyword.

struct __is_void {
   int val;
} a = { 42 };

$ clang --version
clang version 3.2 (tags/RELEASE_32/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix

$ clang -c __is_void.cpp
<no error, object file is generated successfully>

$ clang -emit-ast __is_void.cpp
<no error, AST file is generated successfully>

$ clang -c __is_void.ast
clang: include/clang/Basic/IdentifierTable.h:168: void 
clang::IdentifierInfo::RevertTokenIDToIdentifier(): Assertion `TokenID 
!= tok::identifier && "Already at tok::identifier"' failed.
clang: error: unable to execute command: Segmentation fault (core dumped)
clang: error: clang frontend command failed due to signal (use -v to see 
invocation)
clang version 3.2 (tags/RELEASE_32/final)
Target: x86_64-unknown-linux-gnu
Thread model: posix
clang: note: diagnostic msg: PLEASE submit a bug report to 
http://llvm.org/bugs/ and include the crash backtrace, preprocessed 
source, and associated run script.
clang: note: diagnostic msg: Error generating preprocessed source(s) - 
no preprocessable inputs.

This assertion failure (with a different test case) was previously 
reported here:
   http://llvm.org/bugs/show_bug.cgi?id=13020
   Bug 13020 - Clang 3.1 assertion failures reading and writing AST files

The assertion failure occurs here:

include/clang/Basic/IdentifierTable.h:
167   void RevertTokenIDToIdentifier() {
168     assert(TokenID != tok::identifier && "Already at tok::identifier");
169     TokenID = tok::identifier;
170     RevertedTokenID = true;
171   }

When called from the AST deserialization code here:

lib/Serialization/ASTReader.cpp:
  461 IdentifierInfo *ASTIdentifierLookupTrait::ReadData(const 
internal_key_type& k,
  462                                                    const unsigned 
char* d,
  463                                                    unsigned DataLen) {
  ...
  487   unsigned Bits = ReadUnalignedLE16(d);
  ...
  490   bool HasRevertedTokenIDToIdentifier = Bits & 0x01;
  ...
  502   // Build the IdentifierInfo itself and link the identifier ID with
  503   // the new IdentifierInfo.
  504   IdentifierInfo *II = KnownII;
  505   if (!II) {
  506     II = &Reader.getIdentifierTable().getOwn(StringRef(k.first, 
k.second));
  507     KnownII = II;
  508   }
  509   Reader.markIdentifierUpToDate(II);
  510   II->setIsFromAST();
  511
  512   // Set or check the various bits in the IdentifierInfo structure.
  513   // Token IDs are read-only.
  514   if (HasRevertedTokenIDToIdentifier)
  515     II->RevertTokenIDToIdentifier();
  ...
  550 }

At line 515, the code is attempting to restore the RevertedTokenID field 
for the IdentifierInfo instance by calling RevertTokenIDToIdentifier(), 
but the code then asserts because the token kind (TokenID) already 
equals tok::identifier.

The corresponding serialization code is here:

lib/Serialization/ASTWriter.cpp:
2658 class ASTIdentifierTableTrait {
....
2741   void EmitData(raw_ostream& Out, IdentifierInfo* II,
2742                 IdentID ID, unsigned) {
....
2750     uint32_t Bits = (uint32_t)II->getObjCOrBuiltinID();
....
2758     Bits = (Bits << 1) | 
unsigned(II->hasRevertedTokenIDToIdentifier());
....
2760     clang::io::Emit16(Out, Bits);
....
2784   }
2785 };

Line 1131 and 1132 below contain the calls to revert the token ID and 
set the token kind to tok::identifier when a keyword is used as a struct 
name.  I suspect this is what sets the stage for the later assert when 
deserializing the AST, but I haven't debugged further.

lib/Parse/ParseDeclCXX.cpp:
1049 void Parser::ParseClassSpecifier(tok::TokenKind TagTokKind,
1050                                  SourceLocation StartLoc, DeclSpec &DS,
1051                                  const ParsedTemplateInfo 
&TemplateInfo,
1052                                  AccessSpecifier AS,
1053                                  bool EnteringContext, 
DeclSpecContext DSC) {
....
1107   if (TagType == DeclSpec::TST_struct &&
1108       !Tok.is(tok::identifier) &&
1109       Tok.getIdentifierInfo() &&
1110       (Tok.is(tok::kw___is_arithmetic) ||
....
1125        Tok.is(tok::kw___is_void))) {
1126     // GNU libstdc++ 4.2 and libc++ use certain intrinsic names as the
1127     // name of struct templates, but some are keywords in GCC >= 4.3
1128     // and Clang. Therefore, when we see the token sequence "struct
1129     // X", make X into a normal identifier rather than a keyword, to
1130     // allow libstdc++ 4.2 and libc++ to work properly.
1131     Tok.getIdentifierInfo()->RevertTokenIDToIdentifier();
1132     Tok.setKind(tok::identifier);
1133   }
....
1501 }

The problem might also be that the IdentifierInfo constructor 
initializes TokenID to tok::identifier by default:

lib/Basic/IdentifierTable.cpp:
  31 IdentifierInfo::IdentifierInfo() {
  32   TokenID = tok::identifier;
  ..
  48 }

It isn't clear to me what the preferred fix for this would be.  Options 
include:

1) Remove the assert.

2) Change the default initialization of TokenID in the IdentifierInfo 
constructor from tok::identifier to tok::unknown and force all instances 
to be explicitly initialized.

3) Modify ASTIdentifierLookupTrait::ReadData() above to force the 
TokenID value to something other than tok::identifier before calling 
RevertTokenIDToIdentifier().

4) Others?

Tom.




More information about the cfe-dev mailing list