[llvm] r358857 - llvm-undname: Fix hex escapes in wchar_t, char16_t, char32_t strings

Nico Weber via llvm-commits llvm-commits at lists.llvm.org
Sun Apr 21 10:19:27 PDT 2019


Author: nico
Date: Sun Apr 21 10:19:27 2019
New Revision: 358857

URL: http://llvm.org/viewvc/llvm-project?rev=358857&view=rev
Log:
llvm-undname: Fix hex escapes in wchar_t, char16_t, char32_t strings

llvm-undname used to put '\x' in front of every pair of nibbles, but
u"\xD7\xFF" produces a string with 6 bytes: \xD7 \0 \xFF \0 (and \0\0). Correct
for a single character (plus terminating \0) is u\xD7FF instead.
Now, wchar_t, char16_t, and char32_t strings roundtrip from source to
clang-cl (and cl.exe) and then llvm-undname.

(...at least as long as it's not a string like L"\xD7FF" L"foo" which
gets demangled as L"\xD7FFfoo", where the compiler then considers the
"f" as part of the hex escape. That seems ok.)

Also add a comment saying that the "almost-valid" char32_t string I
added in my last commit is actually produced by compilers.

Modified:
    llvm/trunk/lib/Demangle/MicrosoftDemangle.cpp
    llvm/trunk/test/Demangle/ms-string-literals.test

Modified: llvm/trunk/lib/Demangle/MicrosoftDemangle.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Demangle/MicrosoftDemangle.cpp?rev=358857&r1=358856&r2=358857&view=diff
==============================================================================
--- llvm/trunk/lib/Demangle/MicrosoftDemangle.cpp (original)
+++ llvm/trunk/lib/Demangle/MicrosoftDemangle.cpp Sun Apr 21 10:19:27 2019
@@ -1079,10 +1079,10 @@ static void outputHex(OutputStream &OS,
       writeHexDigit(&TempBuffer[Pos--], C % 16);
       C /= 16;
     }
-    TempBuffer[Pos--] = 'x';
-    assert(Pos >= 0);
-    TempBuffer[Pos--] = '\\';
   }
+  TempBuffer[Pos--] = 'x';
+  assert(Pos >= 0);
+  TempBuffer[Pos--] = '\\';
   OS << StringView(&TempBuffer[Pos + 1]);
 }
 

Modified: llvm/trunk/test/Demangle/ms-string-literals.test
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Demangle/ms-string-literals.test?rev=358857&r1=358856&r2=358857&view=diff
==============================================================================
--- llvm/trunk/test/Demangle/ms-string-literals.test (original)
+++ llvm/trunk/test/Demangle/ms-string-literals.test Sun Apr 21 10:19:27 2019
@@ -730,7 +730,10 @@
 ; CHECK: L"012345678901234567890123456789AB"...
 
 ??_C at _13IIHIAFKH@?W?$PP?$AA?$AA@
-; CHECK: L"\xD7\xFF"
+; CHECK: L"\xD7FF"
+
+??_C at _03IIHIAFKH@?$PP?W?$AA?$AA@
+; CHECK: u"\xD7FF"
 
 ??_C at _02PCEFGMJL@hi?$AA@
 ; CHECK: "hi"
@@ -785,9 +788,7 @@
 ; This is technically not a valid u32 string since the character in it is not
 ; <= 0x10FFFF like unicode demands. (Also, the crc doesn't match the contents.)
 ; It's here because this input used to cause a stack overflow in outputHex().
-
-; FIXME: The demangler currently writes for \x codes for a single U string
-; character. That's incorrect since that would mangle two four characters.
+; Both cl.exe and clang-cl produce it for `const char32_t* s = U"\x11223344";`
 
 ??_C at _07LJGFEJEB@D3?$CC?$BB?$AA?$AA?$AA?$AA@)
-; CHECK: U"\x11\x22\x33\x44"
+; CHECK: U"\x11223344"




More information about the llvm-commits mailing list