<html>
<head>
<base href="https://bugs.llvm.org/">
</head>
<body><span class="vcard"><a class="email" href="mailto:rnk@google.com" title="Reid Kleckner <rnk@google.com>"> <span class="fn">Reid Kleckner</span></a>
</span> changed
<a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED WONTFIX - [clang-cl] incorrectly encodes ordinary string literals containing universal-character-names in UTF-8"
href="https://bugs.llvm.org/show_bug.cgi?id=41536">bug 41536</a>
<br>
<table border="1" cellspacing="0" cellpadding="8">
<tr>
<th>What</th>
<th>Removed</th>
<th>Added</th>
</tr>
<tr>
<td style="text-align:right;">CC</td>
<td>
</td>
<td>rnk@google.com
</td>
</tr>
<tr>
<td style="text-align:right;">Resolution</td>
<td>---
</td>
<td>WONTFIX
</td>
</tr>
<tr>
<td style="text-align:right;">Status</td>
<td>NEW
</td>
<td>RESOLVED
</td>
</tr></table>
<p>
<div>
<b><a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED WONTFIX - [clang-cl] incorrectly encodes ordinary string literals containing universal-character-names in UTF-8"
href="https://bugs.llvm.org/show_bug.cgi?id=41536#c1">Comment # 1</a>
on <a class="bz_bug_link
bz_status_RESOLVED bz_closed"
title="RESOLVED WONTFIX - [clang-cl] incorrectly encodes ordinary string literals containing universal-character-names in UTF-8"
href="https://bugs.llvm.org/show_bug.cgi?id=41536">bug 41536</a>
from <span class="vcard"><a class="email" href="mailto:rnk@google.com" title="Reid Kleckner <rnk@google.com>"> <span class="fn">Reid Kleckner</span></a>
</span></b>
<pre>I think clang is working as intended here. I looked at [lex.charset] in the C++
standard, and it specifically says that these \u characters are characters in
the UCS ISO standard:
"""
The character designated by the universal-character-name \UNNNNNNNN is that
character whose character
short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the
universal-character-name \uNNNN
is that character whose character short name in ISO/IEC 10646 is 0000NNNN. I
"""
It's arguable that we should strive for bug-for-bug compatibility with MSVC in
this case, but I personally don't think we should.
Regarding the very real concern of emitting unicode in a Windows command
prompt, my advice is to always stick to the wide APIs, unfortunately. LLVM
itself goes to the trouble to directly call WriteConsoleW:
<a href="https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/llvm/lib/Support/raw_ostream.cpp#L652">https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/llvm/lib/Support/raw_ostream.cpp#L652</a></pre>
</div>
</p>
<hr>
<span>You are receiving this mail because:</span>
<ul>
<li>You are on the CC list for the bug.</li>
</ul>
</body>
</html>