[llvm] r341000 - [MS Demangler] Fix several crashes and demangling bugs.

Zachary Turner via llvm-commits llvm-commits at lists.llvm.org
Wed Aug 29 16:56:09 PDT 2018


Author: zturner
Date: Wed Aug 29 16:56:09 2018
New Revision: 341000

URL: http://llvm.org/viewvc/llvm-project?rev=341000&view=rev
Log:
[MS Demangler] Fix several crashes and demangling bugs.

These bugs were found by writing a Python script which spidered
the entire Chromium build directory tree demangling every symbol
in every object file.  At the start, the tool printed:

  Processed 27443 object files.
  2926377/2936108 symbols successfully demangled (99.6686%)
  9731 symbols could not be demangled (0.3314%)
  14589 files crashed while demangling (53.1611%)

After this patch, it prints:

  Processed 27443 object files.
  41295518/41295617 symbols successfully demangled (99.9998%)
  99 symbols could not be demangled (0.0002%)
  0 files crashed while demangling (0.0000%)

The issues fixed in this patch are:

  * Ignore empty parameter packs.  Previously we would encounter
    a mangling for an empty parameter pack and add a null node
    to the AST.  Since we don't print these anyway, we now just
    don't add anything to the AST and ignore it entirely.  This
    fixes some of the crashes.

  * Account for "incorrect" string literal demanglings.  Apparently
    an older version of clang would not truncate mangled string
    literals to 32 bytes of encoded character data.  The demangling
    code however would allocate a 32 byte buffer thinking that it
    would not encounter more than this, and overrun the buffer.
    We now demangle up to 128 bytes of data, since the buggy
    clang would encode up to 32 *characters* of data.

  * Extended support for demangling init-fini stubs.  If you had
    something like
      struct Foo {
        static vector<string> S;
      };
    this would generate a dynamic atexit initializer *for the
    variable*.  We didn't handle this, but now we print something
    nice.  This is actually an improvement over undname, which will
    fail to demangle this at all.

  * Fixed one case of static this adjustment.  We weren't handling
    several thunk codes so we didn't recognize the mangling.  These
    are now handled.

  * Fixed a back-referencing problem.  Member pointer templates
    should have their components considered for back-referencing

The remaining 99 symbols which can't be demangled are all symbols
which are compiler-generated and undname can't demangle either.

Modified:
    llvm/trunk/lib/Demangle/MicrosoftDemangle.cpp
    llvm/trunk/lib/Demangle/MicrosoftDemangleNodes.cpp
    llvm/trunk/lib/Demangle/MicrosoftDemangleNodes.h
    llvm/trunk/test/Demangle/ms-back-references.test
    llvm/trunk/test/Demangle/ms-operators.test
    llvm/trunk/test/Demangle/ms-string-literals.test

Modified: llvm/trunk/lib/Demangle/MicrosoftDemangle.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Demangle/MicrosoftDemangle.cpp?rev=341000&r1=340999&r2=341000&view=diff
==============================================================================
--- llvm/trunk/lib/Demangle/MicrosoftDemangle.cpp (original)
+++ llvm/trunk/lib/Demangle/MicrosoftDemangle.cpp Wed Aug 29 16:56:09 2018
@@ -350,8 +350,8 @@ private:
   VariableSymbolNode *
   demangleRttiBaseClassDescriptorNode(ArenaAllocator &Arena,
                                       StringView &MangledName);
-  FunctionSymbolNode *demangleDynamicStructorFunction(StringView &MangledName,
-                                                      bool IsDestructor);
+  FunctionSymbolNode *demangleInitFiniStub(StringView &MangledName,
+                                           bool IsDestructor);
 
   NamedIdentifierNode *demangleSimpleName(StringView &MangledName,
                                           bool Memorize);
@@ -520,16 +520,35 @@ Demangler::demangleRttiBaseClassDescript
   return VSN;
 }
 
-FunctionSymbolNode *
-Demangler::demangleDynamicStructorFunction(StringView &MangledName,
-                                           bool IsDestructor) {
+FunctionSymbolNode *Demangler::demangleInitFiniStub(StringView &MangledName,
+                                                    bool IsDestructor) {
   DynamicStructorIdentifierNode *DSIN =
       Arena.alloc<DynamicStructorIdentifierNode>();
   DSIN->IsDestructor = IsDestructor;
-  DSIN->Name = demangleFullyQualifiedTypeName(MangledName);
-  QualifiedNameNode *QNN = synthesizeQualifiedName(Arena, DSIN);
-  FunctionSymbolNode *FSN = demangleFunctionEncoding(MangledName);
-  FSN->Name = QNN;
+
+  // What follows is a main symbol name. This may include namespaces or class
+  // back references.
+  QualifiedNameNode *QN = demangleFullyQualifiedSymbolName(MangledName);
+
+  SymbolNode *Symbol = demangleEncodedSymbol(MangledName, QN);
+  FunctionSymbolNode *FSN = nullptr;
+  Symbol->Name = QN;
+
+  if (Symbol->kind() == NodeKind::VariableSymbol) {
+    DSIN->Variable = static_cast<VariableSymbolNode *>(Symbol);
+    if (!MangledName.consumeFront('@')) {
+      Error = true;
+      return nullptr;
+    }
+
+    FSN = demangleFunctionEncoding(MangledName);
+    FSN->Name = synthesizeQualifiedName(Arena, DSIN);
+  } else {
+    FSN = static_cast<FunctionSymbolNode *>(Symbol);
+    DSIN->Name = Symbol->Name;
+    FSN->Name = synthesizeQualifiedName(Arena, DSIN);
+  }
+
   return FSN;
 }
 
@@ -569,9 +588,9 @@ SymbolNode *Demangler::demangleSpecialIn
   case SpecialIntrinsicKind::RttiBaseClassDescriptor:
     return demangleRttiBaseClassDescriptorNode(Arena, MangledName);
   case SpecialIntrinsicKind::DynamicInitializer:
-    return demangleDynamicStructorFunction(MangledName, false);
+    return demangleInitFiniStub(MangledName, false);
   case SpecialIntrinsicKind::DynamicAtexitDestructor:
-    return demangleDynamicStructorFunction(MangledName, true);
+    return demangleInitFiniStub(MangledName, true);
   default:
     break;
   }
@@ -837,6 +856,8 @@ SymbolNode *Demangler::parse(StringView
   // What follows is a main symbol name. This may include namespaces or class
   // back references.
   QualifiedNameNode *QN = demangleFullyQualifiedSymbolName(MangledName);
+  if (Error)
+    return nullptr;
 
   SymbolNode *Symbol = demangleEncodedSymbol(MangledName, QN);
   if (Symbol) {
@@ -1325,10 +1346,9 @@ Demangler::demangleStringLiteral(StringV
         goto StringLiteralError;
     }
   } else {
-    if (StringByteSize > 32)
-      Result->IsTruncated = true;
-
-    constexpr unsigned MaxStringByteLength = 32;
+    // The max byte length is actually 32, but some compilers mangled strings
+    // incorrectly, so we have to assume it can go higher.
+    constexpr unsigned MaxStringByteLength = 32 * 4;
     uint8_t StringBytes[MaxStringByteLength];
 
     unsigned BytesDecoded = 0;
@@ -1337,6 +1357,9 @@ Demangler::demangleStringLiteral(StringV
       StringBytes[BytesDecoded++] = demangleCharLiteral(MangledName);
     }
 
+    if (StringByteSize > BytesDecoded)
+      Result->IsTruncated = true;
+
     unsigned CharBytes =
         guessCharByteSize(StringBytes, BytesDecoded, StringByteSize);
     assert(StringByteSize % CharBytes == 0);
@@ -1587,6 +1610,10 @@ FuncClass Demangler::demangleFunctionCla
     return FuncClass(FC_Private | FC_Virtual);
   case 'F':
     return FuncClass(FC_Private | FC_Virtual);
+  case 'G':
+    return FuncClass(FC_Private | FC_StaticThisAdjust);
+  case 'H':
+    return FuncClass(FC_Private | FC_StaticThisAdjust | FC_Far);
   case 'I':
     return FuncClass(FC_Protected);
   case 'J':
@@ -1760,7 +1787,6 @@ TypeNode *Demangler::demangleType(String
     Ty = demangleCustomType(MangledName);
   } else {
     Ty = demanglePrimitiveType(MangledName);
-    assert(Ty && !Error);
     if (!Ty || Error)
       return Ty;
   }
@@ -1976,14 +2002,14 @@ PointerTypeNode *Demangler::demangleMemb
   Pointer->Quals = Qualifiers(Pointer->Quals | ExtQuals);
 
   if (MangledName.consumeFront("8")) {
-    Pointer->ClassParent = demangleFullyQualifiedSymbolName(MangledName);
+    Pointer->ClassParent = demangleFullyQualifiedTypeName(MangledName);
     Pointer->Pointee = demangleFunctionType(MangledName, true);
   } else {
     Qualifiers PointeeQuals = Q_None;
     bool IsMember = false;
     std::tie(PointeeQuals, IsMember) = demangleQualifiers(MangledName);
     assert(IsMember);
-    Pointer->ClassParent = demangleFullyQualifiedSymbolName(MangledName);
+    Pointer->ClassParent = demangleFullyQualifiedTypeName(MangledName);
 
     Pointer->Pointee = demangleType(MangledName, QualifierMangleMode::Drop);
     Pointer->Pointee->Quals = PointeeQuals;
@@ -2121,18 +2147,21 @@ Demangler::demangleTemplateParameterList
   size_t Count = 0;
 
   while (!Error && !MangledName.startsWith('@')) {
+    if (MangledName.consumeFront("$S") || MangledName.consumeFront("$$V") ||
+        MangledName.consumeFront("$$$V")) {
+      // Empty parameter pack.
+      continue;
+    }
+
     ++Count;
+
     // Template parameter lists don't participate in back-referencing.
     *Current = Arena.alloc<NodeList>();
 
     NodeList &TP = **Current;
 
     TemplateParameterReferenceNode *TPRN = nullptr;
-    if (MangledName.consumeFront("$S") || MangledName.consumeFront("$$V") ||
-        MangledName.consumeFront("$$$V")) {
-      // Empty parameter pack.
-      TP.N = nullptr;
-    } else if (MangledName.consumeFront("$$Y")) {
+    if (MangledName.consumeFront("$$Y")) {
       // Template alias
       TP.N = demangleFullyQualifiedTypeName(MangledName);
     } else if (MangledName.consumeFront("$$B")) {

Modified: llvm/trunk/lib/Demangle/MicrosoftDemangleNodes.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Demangle/MicrosoftDemangleNodes.cpp?rev=341000&r1=340999&r2=341000&view=diff
==============================================================================
--- llvm/trunk/lib/Demangle/MicrosoftDemangleNodes.cpp (original)
+++ llvm/trunk/lib/Demangle/MicrosoftDemangleNodes.cpp Wed Aug 29 16:56:09 2018
@@ -223,9 +223,15 @@ void DynamicStructorIdentifierNode::outp
   else
     OS << "`dynamic initializer for ";
 
-  OS << "'";
-  Name->output(OS, Flags);
-  OS << "''";
+  if (Variable) {
+    OS << "`";
+    Variable->output(OS, Flags);
+    OS << "''";
+  } else {
+    OS << "'";
+    Name->output(OS, Flags);
+    OS << "''";
+  }
 }
 
 void NamedIdentifierNode::output(OutputStream &OS, OutputFlags Flags) const {

Modified: llvm/trunk/lib/Demangle/MicrosoftDemangleNodes.h
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Demangle/MicrosoftDemangleNodes.h?rev=341000&r1=340999&r2=341000&view=diff
==============================================================================
--- llvm/trunk/lib/Demangle/MicrosoftDemangleNodes.h (original)
+++ llvm/trunk/lib/Demangle/MicrosoftDemangleNodes.h Wed Aug 29 16:56:09 2018
@@ -322,7 +322,7 @@ enum class NodeKind {
   LocalStaticGuardVariable,
   FunctionSymbol,
   VariableSymbol,
-  SpecialTableSymbol,
+  SpecialTableSymbol
 };
 
 struct Node {
@@ -443,6 +443,7 @@ struct DynamicStructorIdentifierNode : p
 
   void output(OutputStream &OS, OutputFlags Flags) const override;
 
+  VariableSymbolNode *Variable = nullptr;
   QualifiedNameNode *Name = nullptr;
   bool IsDestructor = false;
 };

Modified: llvm/trunk/test/Demangle/ms-back-references.test
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Demangle/ms-back-references.test?rev=341000&r1=340999&r2=341000&view=diff
==============================================================================
--- llvm/trunk/test/Demangle/ms-back-references.test (original)
+++ llvm/trunk/test/Demangle/ms-back-references.test Wed Aug 29 16:56:09 2018
@@ -169,3 +169,6 @@
 
 ?AddEmitPasses at EmitAssemblyHelper@?A0x43583946@@AEAA_NAEAVPassManager at legacy@llvm@@W4BackendAction at clang@@AEAVraw_pwrite_stream at 5@PEAV85@@Z
 ; CHECK: bool __cdecl `anonymous namespace'::EmitAssemblyHelper::AddEmitPasses(class llvm::legacy::PassManager &, enum clang::BackendAction, class llvm::raw_pwrite_stream &, class llvm::raw_pwrite_stream *)
+
+??$forward at P8?$DecoderStream@$01 at media@@AEXXZ at std@@YA$$QAP8?$DecoderStream@$01 at media@@AEXXZAAP812 at AEXXZ@Z
+; CHECK: void (__thiscall media::DecoderStream<2>::*&& __cdecl std::forward<void (__thiscall media::DecoderStream<2>::*)(void)>(void (__thiscall media::DecoderStream<2>::*&)(void)))(void)
\ No newline at end of file

Modified: llvm/trunk/test/Demangle/ms-operators.test
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Demangle/ms-operators.test?rev=341000&r1=340999&r2=341000&view=diff
==============================================================================
--- llvm/trunk/test/Demangle/ms-operators.test (original)
+++ llvm/trunk/test/Demangle/ms-operators.test Wed Aug 29 16:56:09 2018
@@ -161,6 +161,9 @@
 ??_EBase@@UEAAPEAXI at Z
 ; CHECK: virtual void * __cdecl Base::`vector deleting dtor'(unsigned int)
 
+??_EBase@@G3AEPAXI at Z
+; CHECK: [thunk]: void * __thiscall Base::`vector deleting dtor'`adjustor{4}'(unsigned int)
+
 ??_F?$SomeTemplate at H@@QAEXXZ
 ; CHECK: void __thiscall SomeTemplate<int>::`default ctor closure'(void)
 
@@ -224,6 +227,9 @@
 ??__FFoo@@YAXXZ
 ; CHECK: void __cdecl `dynamic atexit destructor for 'Foo''(void)
 
+??__F_decisionToDFA at XPathLexer@@0V?$vector at VDFA@dfa at antlr4@@V?$allocator at VDFA@dfa at antlr4@@@std@@@std@@A at YAXXZ
+; CHECK: void __cdecl `dynamic atexit destructor for `static class std::vector<class antlr4::dfa::DFA, class std::allocator<class antlr4::dfa::DFA>> XPathLexer::_decisionToDFA''(void)
+
 ??__K_deg@@YAHO at Z
 ; CHECK: int __cdecl operator ""_deg(long double)
 

Modified: llvm/trunk/test/Demangle/ms-string-literals.test
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Demangle/ms-string-literals.test?rev=341000&r1=340999&r2=341000&view=diff
==============================================================================
--- llvm/trunk/test/Demangle/ms-string-literals.test (original)
+++ llvm/trunk/test/Demangle/ms-string-literals.test Wed Aug 29 16:56:09 2018
@@ -761,4 +761,13 @@
 ; CHECK: const char16_t * {u"012345678901234"}
 
 ??_C at _0CA@KFPHPCC at 0?$AA?$AA?$AA1?$AA?$AA?$AA2?$AA?$AA?$AA3?$AA?$AA?$AA4?$AA?$AA?$AA5?$AA?$AA?$AA6?$AA?$AA?$AA?$AA?$AA?$AA?$AA@
-; CHECK: const char32_t * {U"0123456"}
\ No newline at end of file
+; CHECK: const char32_t * {U"0123456"}
+
+; There are too many bytes encoded in this string literal (it should encode a max of 32 bytes)
+; but some buggy compilers will incorrectly generate this, so we need to be able to demangle
+; both the correct and incorrect versions.
+??_C at _0CG@HJGBPLNO at l?$AAo?$AAo?$AAk?$AAA?$AAh?$AAe?$AAa?$AAd?$AAH?$AAa?$AAr?$AAd?$AAB?$AAr?$AAe?$AAa?$AAk?$AA?$AA?$AA@
+; CHECK: const char16_t * {u"lookAheadHardBreak"}
+
+??_C at _0CG@HJGBPLNO at l?$AAo?$AAo?$AAk?$AAA?$AAh?$AAe?$AAa?$AAd?$AAH?$AAa?$AAr?$AAd?$AAB?$AAr?$AAe?$AA@
+; CHECK: const char16_t * {u"lookAheadHardBre"...}
\ No newline at end of file




More information about the llvm-commits mailing list