<html>

    <head>

      <base href="https://bugs.llvm.org/">

    </head>

    <body><span class="vcard"><a class="email" href="mailto:rnk@google.com" title="Reid Kleckner <rnk@google.com>"> <span class="fn">Reid Kleckner</span></a>

</span> changed

          <a class="bz_bug_link 

          bz_status_RESOLVED  bz_closed"

   title="RESOLVED WONTFIX - [clang-cl] incorrectly encodes ordinary string literals containing universal-character-names in UTF-8"

   href="https://bugs.llvm.org/show_bug.cgi?id=41536">bug 41536</a>

          <br>

             <table border="1" cellspacing="0" cellpadding="8">

          <tr>

            <th>What</th>

            <th>Removed</th>

            <th>Added</th>

          </tr>

         <tr>

           <td style="text-align:right;">CC</td>

           <td>

           </td>

           <td>rnk@google.com

           </td>

         </tr>

         <tr>

           <td style="text-align:right;">Resolution</td>

           <td>---

           </td>

           <td>WONTFIX

           </td>

         </tr>

         <tr>

           <td style="text-align:right;">Status</td>

           <td>NEW

           </td>

           <td>RESOLVED

           </td>

         </tr></table>

      <p>

        <div>

            <b><a class="bz_bug_link 

          bz_status_RESOLVED  bz_closed"

   title="RESOLVED WONTFIX - [clang-cl] incorrectly encodes ordinary string literals containing universal-character-names in UTF-8"

   href="https://bugs.llvm.org/show_bug.cgi?id=41536#c1">Comment # 1</a>

              on <a class="bz_bug_link 

          bz_status_RESOLVED  bz_closed"

   title="RESOLVED WONTFIX - [clang-cl] incorrectly encodes ordinary string literals containing universal-character-names in UTF-8"

   href="https://bugs.llvm.org/show_bug.cgi?id=41536">bug 41536</a>

              from <span class="vcard"><a class="email" href="mailto:rnk@google.com" title="Reid Kleckner <rnk@google.com>"> <span class="fn">Reid Kleckner</span></a>

</span></b>

        <pre>I think clang is working as intended here. I looked at [lex.charset] in the C++

standard, and it specifically says that these \u characters are characters in

the UCS ISO standard:

"""

The character designated by the universal-character-name \UNNNNNNNN is that

character whose character

short name in ISO/IEC 10646 is NNNNNNNN; the character designated by the

universal-character-name \uNNNN

is that character whose character short name in ISO/IEC 10646 is 0000NNNN. I

"""

It's arguable that we should strive for bug-for-bug compatibility with MSVC in

this case, but I personally don't think we should.

Regarding the very real concern of emitting unicode in a Windows command

prompt, my advice is to always stick to the wide APIs, unfortunately. LLVM

itself goes to the trouble to directly call WriteConsoleW:

<a href="https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/llvm/lib/Support/raw_ostream.cpp#L652">https://github.com/llvm/llvm-project/blob/2946cd701067404b99c39fb29dc9c74bd7193eb3/llvm/lib/Support/raw_ostream.cpp#L652</a></pre>

        </div>

      </p>

      <hr>

      <span>You are receiving this mail because:</span>

      <ul>

          <li>You are on the CC list for the bug.</li>

      </ul>

    </body>

</html>