[llvm-bugs] [Bug 41536] New: [clang-cl] incorrectly encodes ordinary string literals containing universal-character-names in UTF-8

via llvm-bugs llvm-bugs at lists.llvm.org
Thu Apr 18 19:35:27 PDT 2019


https://bugs.llvm.org/show_bug.cgi?id=41536

            Bug ID: 41536
           Summary: [clang-cl] incorrectly encodes ordinary string
                    literals containing universal-character-names in UTF-8
           Product: clang
           Version: 8.0
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: C++
          Assignee: unassignedclangbugs at nondot.org
          Reporter: Casey at Carter.net
                CC: blitzrakete at gmail.com, dgregor at apple.com,
                    erik.pilkington at gmail.com, llvm-bugs at lists.llvm.org,
                    richard-llvm at metafoo.co.uk

Compiling this program:

  extern const char str[] = "\u0020\u00f4\u00e2";

  int main() {
    static_assert(sizeof(str) == 4, "BOOM");
    static_assert(sizeof(str) != 6, "BOOM");
  }

with "cl /FA /c" (presumably any version; notably the 19.20 release on Compiler
Explorar) succeeds, and produces assembly output containing the line:

  ?str@@3QBDB DB        ' ', 0f4H, 0e2H, 00H                    ; str

note the three universal-character-names (UCNs) have been replaced with the
appropriate corresponding WINDOWS-1252 encodings,which happen to use the same
code unit values as Unicode, as required by [lex.ccon]/9.

Compiling the same program with "clang-cl /FA /c" does not succeed: both
static_asserts fire. Commenting them out, the produced assembly contains the
line:

  .asciz        " \303\264\303\242"

which, despite using superior AT&T syntax, is substantially different. The UCNs
are now encoded in UTF-8, which will produce mojibake when output to a console
expecting WINDOWS-1252. \u0080 is another good example; cl rejects it since
U+0080 has no representation in WINDOWS-1252 whereas clang-cl encodes U+0080 in
UTF-8 without complaint.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190419/adde6831/attachment.html>


More information about the llvm-bugs mailing list