[llvm-bugs] [Bug 41536] [clang-cl] incorrectly encodes ordinary string literals containing universal-character-names in UTF-8

via llvm-bugs llvm-bugs at lists.llvm.org
Fri Apr 19 11:31:06 PDT 2019


https://bugs.llvm.org/show_bug.cgi?id=41536

Reid Kleckner <rnk at google.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rnk at google.com
         Resolution|---                         |WONTFIX
             Status|NEW                         |RESOLVED

--- Comment #1 from Reid Kleckner <rnk at google.com> ---
I think clang is working as intended here. I looked at [lex.charset] in the C++
standard, and it specifically says that these \u characters are characters in
the UCS ISO standard:

"""
The character designated by the universal-character-name \UNNNNNNNN is that
character whose character
short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the
universal-character-name \uNNNN
is that character whose character short name in ISO/IEC 10646 is 0000NNNN. I
"""

It's arguable that we should strive for bug-for-bug compatibility with MSVC in
this case, but I personally don't think we should.

Regarding the very real concern of emitting unicode in a Windows command
prompt, my advice is to always stick to the wide APIs, unfortunately. LLVM
itself goes to the trouble to directly call WriteConsoleW:
https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/llvm/lib/Support/raw_ostream.cpp#L652

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20190419/be7960c9/attachment-0001.html>


More information about the llvm-bugs mailing list