[cfe-dev] Feedback requested on MS compatibility work (bug 11789)

Aaron Wishnick aaron.s.wishnick at gmail.com
Sat Mar 3 13:11:06 PST 2012


I'm working on bug 11789, which is a MS compatibility bug. I'm a new developer to the clang project, so I could really use some guidance. Here's the background on what I'm doing:

MSVC does some wacky things to make __FUNCTION__ act as if it were a macro like __FILE__, rather than a predefined expression. In particular, when pasting L##__FUNCTION__, it results in a wide string literal equivalent to __FUNCTION__, rather than the identifier L__FUNCTION__. Here's an example of what needs to happen in the preprocessor:

#define __STR2WSTR(str) L##str
#define _STR2WSTR(str) __STR2WSTR(str)
// Expands to __LPREFIX( __FUNCTION__)
_STR2WSTR(__FUNCTION__)
// Expands to  L__FUNCTION__
__STR2WSTR(__FUNCTION__)

#define __FOO_PREFIX(str) FOO##str
#define _FOO_PREFIX(str) __FOO_PREFIX(str)
// Expands to FOO__FSTREXP __FUNCTION__
_FOO_PREFIX(__FUNCTION__)
// Expands to FOO__FUNCTION__
__FOO_PREFIX(__FUNCTION__)

#define __FOO_SUFFIX(str) str##FOO
#define _FOO_SUFFIX(str) __FOO_SUFFIX(str)
// Expands to  __FUNCTION__ FOO (note the leading space before __FUNCTION__)
_FOO_SUFFIX(__FUNCTION__)
// Expands to __FUNCTION__FOO
__FOO_SUFFIX(__FUNCTION__)

So basically, as I see it, there are three rules:
1. L##__FUNCTION__ expands to __LPREFIX( __FUNCTION__)
2. any_token##__FUNCTION__ expands to any_token__FSTREXP __FUNCTION__ . Note __FUNCTION__ gets a space on both sides
3. __FUNCTION__##any_token expands to  __FUNCTION__ any_token. Again note the space on both sides.

I've begun writing up the implementation in the preprocessor, without yet worrying about implementing the __LPREFIX and FSTREXP weirdness in the frontend. However, there's one complication that my implementation isn't getting right: __FUNCTION__ is supposed to follow the rules for macro expansion -- note how "__STR2WSTR(__FUNCTION__)" just  expands to "L__FUNCTION__". It's as if the first time __FUNCTION__ is expanded, it gets expanded to an identical token, but with some special "expanded" flag set, and then once the "expanded" __FUNCTION__ is encountered, the special rules apply. In my implementation so far, both "__STR2WSTR(__FUNCTION__)" and "_STR2WSTR(__FUNCTION__)" expand to "__LPREFIX( __FUNCTION__)".

I'm going to attach my patch to show what I've done so far, because maybe I'm totally headed in the wrong direction. I'd appreciate if anybody more familiar with the preprocessor might be able to point me in the right direction as far as how to handle the two-stage expansion of __FUNCTION__, as well as to generally review my code so far and let me know if it looks reasonable. Thanks very much! My patch is below:

Index: include/clang/Basic/TokenKinds.def
===================================================================
--- include/clang/Basic/TokenKinds.def	(revision 151696)
+++ include/clang/Basic/TokenKinds.def	(working copy)
@@ -417,6 +417,7 @@
 KEYWORD(__thiscall                  , KEYALL)
 KEYWORD(__forceinline               , KEYALL)
 KEYWORD(__unaligned                 , KEYMS)
+KEYWORD(__LPREFIX                   , KEYMS)
 
 // OpenCL-specific keywords
 KEYWORD(__kernel                    , KEYOPENCL)
Index: include/clang/Lex/TokenLexer.h
===================================================================
--- include/clang/Lex/TokenLexer.h	(revision 151696)
+++ include/clang/Lex/TokenLexer.h	(working copy)
@@ -91,6 +91,12 @@
   /// should not be subject to further macro expansion.
   bool DisableMacroExpansion : 1;
 
+  /// MSLPrefix - This is true when we are outputting __LPREFIX(__FUNCTION__)
+  /// for Microsoft compatibility mode
+  bool MSLPrefix : 1;
+  unsigned MSLPrefixState;
+  SourceRange MSLPrefixRange;
+
   TokenLexer(const TokenLexer&);  // DO NOT IMPLEMENT
   void operator=(const TokenLexer&); // DO NOT IMPLEMENT
 public:
@@ -168,6 +174,12 @@
   /// first token on the next line.
   void HandleMicrosoftCommentPaste(Token &Tok);
 
+  /// HandleMicrosoftLPrefix - In Microsoft compatibility mode, L##__FUNCTION__
+  /// pastes to __LPREFIX( __FUNCTION__). This means it turns into multiple
+  /// tokens. When MSLPrefix is true, we output this stream of tokens. If
+  /// this returns true, the caller should immediately return the token.
+  bool HandleMicrosoftLPrefix(Token &Tok);
+
   /// \brief If \arg loc is a FileID and points inside the current macro
   /// definition, returns the appropriate source location pointing at the
   /// macro expansion source location entry.
Index: lib/Lex/TokenLexer.cpp
===================================================================
--- lib/Lex/TokenLexer.cpp	(revision 151696)
+++ lib/Lex/TokenLexer.cpp	(working copy)
@@ -32,6 +32,9 @@
   ActualArgs = Actuals;
   CurToken = 0;
 
+  MSLPrefix = false;
+  MSLPrefixState = 0;
+
   ExpandLocStart = Tok.getLocation();
   ExpandLocEnd = ELEnd;
   AtStartOfLine = Tok.isAtStartOfLine();
@@ -357,6 +360,10 @@
 /// Lex - Lex and return a token from this macro stream.
 ///
 void TokenLexer::Lex(Token &Tok) {
+  // Handle the Microsoft __LPREFIX extension, if the MSLPrefix flag is set.
+  if (MSLPrefix && HandleMicrosoftLPrefix(Tok))
+    return;
+
   // Lexing off the end of the macro, pop this macro off the expansion stack.
   if (isAtEnd()) {
     // If this is a macro (not a token stream), mark the macro enabled now
@@ -486,6 +493,57 @@
     // Trim excess space.
     Buffer.resize(LHSLen+RHSLen);
 
+    // If Microsoft extensions are enabled, special-case token pasting
+    // __FUNCTION__. L##__FUNCTION__ is the most specific; it turns
+    // into __LPREFIX( __FUNCTION__ ). Anything else, say,
+    // QWERTY##__FUNCTION__ turns into QWERTY__FSTREXP __FUNCTION__
+    // In fact, even __FUNCTION__##__FUNCTION__ observes this rule.
+    // When __FUNCTION__ is the first token, it gets a space after it.
+    bool ForceRawIdentifier = false;
+    if (PP.getLangOptions().MicrosoftExt) {
+      const bool isLHSFunction = LHSLen == 12
+                               && memcmp("__FUNCTION__", &Buffer[0], 12) == 0;
+      const bool isRHSFunction = RHSLen == 12
+                               && memcmp("__FUNCTION__", &Buffer[LHSLen], 12) == 0;
+
+      if (isRHSFunction && LHSLen == 1 && Buffer[0] == 'L') {
+        // __LPREFIX( __FUNCTION__)
+        // For this expansion, we don't return a single token; it actually
+        // turns into four tokens. Paste the __LPREFIX token here, and set
+        // MSLPrefix to true. In Lex(), when this flag is set, we'll
+        // continue outputting this series of tokens rather than
+        // continuing to lex.
+        MSLPrefix = true;
+        MSLPrefixState = 0;
+        MSLPrefixRange.setBegin( Tok.getLocation() );
+        MSLPrefixRange.setEnd( RHS.getLocation() );
+
+        Tok.startToken();
+        Tok.setKind(tok::kw___LPREFIX);
+        Tok.setLength(9);
+        PP.CreateString( "__LPREFIX", 9, Tok,
+                         MSLPrefixRange.getBegin(),
+                         MSLPrefixRange.getEnd() );
+
+        ++CurToken;
+        return false;
+      } else if (isRHSFunction) {
+        // A##__FUNCTION__ -> A__FSTREXP __FUNCTION__
+        Buffer.resize(LHSLen+22);
+        memcpy(&Buffer[LHSLen], "__FSTREXP __FUNCTION__", 22);
+        ForceRawIdentifier = true;
+      } else if (isLHSFunction) {
+        // __FUNCTION__##A -> __FUNCTION__ A
+        Buffer.resize(LHSLen+RHSLen+1);
+        memmove(&Buffer[LHSLen+1], &Buffer[LHSLen], RHSLen);
+        Buffer[LHSLen]= ' ';
+        ForceRawIdentifier = true;
+      }
+    }
+
+    // We'll need this size later
+    unsigned ResultLen = Buffer.size();
+
     // Plop the pasted result (including the trailing newline and null) into a
     // scratch buffer where we can lex it.
     Token ResultTokTmp;
@@ -501,7 +559,8 @@
     // Lex the resultant pasted token into Result.
     Token Result;
 
-    if (Tok.isAnyIdentifier() && RHS.isAnyIdentifier()) {
+    if ((Tok.isAnyIdentifier() && RHS.isAnyIdentifier())
+        || ForceRawIdentifier) {
       // Common paste case: identifier+identifier = identifier.  Avoid creating
       // a lexer and other overhead.
       PP.IncrementPasteCounter(true);
@@ -509,7 +568,7 @@
       Result.setKind(tok::raw_identifier);
       Result.setRawIdentifierData(ResultTokStrPtr);
       Result.setLocation(ResultTokLoc);
-      Result.setLength(LHSLen+RHSLen);
+      Result.setLength(ResultLen);
     } else {
       PP.IncrementPasteCounter(false);
 
@@ -528,7 +587,7 @@
       // Make a lexer object so that we lex and expand the paste result.
       Lexer TL(SourceMgr.getLocForStartOfFile(LocFileID),
                PP.getLangOptions(), ScratchBufStart,
-               ResultTokStrPtr, ResultTokStrPtr+LHSLen+RHSLen);
+               ResultTokStrPtr, ResultTokStrPtr+ResultLen);
 
       // Lex a token in raw mode.  This way it won't look up identifiers
       // automatically, lexing off the end will return an eof token, and
@@ -646,6 +705,53 @@
   PP.HandleMicrosoftCommentPaste(Tok);
 }
 
+/// HandleMicrosoftLPrefix - In Microsoft compatibility mode, L##__FUNCTION__
+/// pastes to __LPREFIX( __FUNCTION__). This means it turns into multiple
+/// tokens. When MSLPrefix is true, we output this stream of tokens. If
+/// this returns true, the caller should immediately return the token.
+bool TokenLexer::HandleMicrosoftLPrefix(Token &Tok) {
+  assert(MSLPrefix && "Expected MSLPrefix to be set");
+
+  SourceManager &SM = PP.getSourceManager();
+  SourceLocation StartLoc = MSLPrefixRange.getBegin(),
+                 EndLoc = MSLPrefixRange.getEnd();
+  switch (MSLPrefixState++) {
+  case 0:
+    Tok.startToken();
+    Tok.setKind(tok::l_paren);
+    Tok.setLength(1);
+    PP.CreateString( "(", 1, Tok, StartLoc, EndLoc );      
+    return true;
+  case 1:
+    // We put a space before __FUNCTION__ to get __LPREFIX( __FUNCTION__ )
+    // as Microsoft's compiler does.
+    Tok.startToken();
+    Tok.setKind(tok::kw___FUNCTION__);
+    Tok.setLength(13);
+    PP.CreateString( " __FUNCTION__", 13, Tok, StartLoc, EndLoc );      
+    return true;
+  case 2:
+    Tok.startToken();
+    Tok.setKind(tok::r_paren);
+    Tok.setLength(1);
+    PP.CreateString( ")", 1, Tok, StartLoc, EndLoc );      
+    return true;
+  case 3:
+    // Once we've pasted __LPREFIX( __FUNCTION__), look for a token
+    // paste (##) operator. If we find one, skip it. This for example:
+    // #define __ENDTEST(str1, str2) L##str1##str2
+    // #define _ENDTEST(str1, str2) __ENDTEST(str1, str2)
+    // #define ENDTEST _ENDTEST(__FUNCTION__, JUNCTION)
+    // ENDTEST should expand to __LPREFIX( __FUNCTION__)JUNCTION
+    if (Tokens[CurToken].is(tok::hashhash)) ++CurToken;
+    MSLPrefix = false;
+    return false;
+  default:
+    assert(false);
+    return false;
+  }
+}
+
 /// \brief If \arg loc is a file ID and points inside the current macro
 /// definition, returns the appropriate source location pointing at the
 /// macro expansion source location entry, otherwise it returns an invalid
Index: lib/Lex/PPMacroExpansion.cpp
===================================================================
--- lib/Lex/PPMacroExpansion.cpp	(revision 151696)
+++ lib/Lex/PPMacroExpansion.cpp	(working copy)
@@ -101,7 +101,7 @@
   Ident__has_warning      = RegisterBuiltinMacro(*this, "__has_warning");
 
   // Microsoft Extensions.
-  if (Features.MicrosoftExt) 
+  if (Features.MicrosoftExt)
     Ident__pragma = RegisterBuiltinMacro(*this, "__pragma");
   else
     Ident__pragma = 0;
Index: lib/CodeGen/CGCXX.cpp
===================================================================
--- lib/CodeGen/CGCXX.cpp	(revision 151696)
+++ lib/CodeGen/CGCXX.cpp	(working copy)
@@ -278,7 +278,6 @@
                                       CXXDtorType dtorType,
                                       const CGFunctionInfo *fnInfo) {
   GlobalDecl GD(dtor, dtorType);
-
   StringRef name = getMangledName(GD);
   if (llvm::GlobalValue *existing = GetGlobalValue(name))
     return existing;
Index: lib/Parse/ParseDecl.cpp
===================================================================
--- lib/Parse/ParseDecl.cpp	(revision 151696)
+++ lib/Parse/ParseDecl.cpp	(working copy)
@@ -2043,7 +2043,7 @@
 
     // Microsoft single token adornments.
     case tok::kw___forceinline:
-      // FIXME: Add handling here!
+      DS.SetFunctionSpecInline(Loc, PrevSpec, DiagID);
       break;
 
     case tok::kw___ptr64:





More information about the cfe-dev mailing list