[PATCH] D149098: [Clang] Add tests and mark as implemented WG14 N2728 (char16_t & char32_t string literals shall be UTF-16 & UTF-32)

Tom Honermann via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Apr 24 13:43:00 PDT 2023


tahonermann added inline comments.


================
Comment at: clang/test/Lexer/char-literal.cpp:2-5
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++17 -Wfour-char-constants -fsyntax-only -verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c++20 -Wfour-char-constants -fsyntax-only -verify %s
 // RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c11 -x c -Wfour-char-constants -fsyntax-only -verify %s
+// RUN: %clang_cc1 -triple x86_64-apple-darwin -std=c2x -x c -Wfour-char-constants -fsyntax-only -verify %s
----------------
C++17 and C2x are added so that UTF-8 character literals are exercised. C++20 is added to exercise the change in type of UTF-8 literals due to `char8_t`.


================
Comment at: clang/test/Lexer/char-literal.cpp:48-50
+#ifndef __cplusplus
+// expected-error at -2 {{universal character name refers to a control character}}
+#endif
----------------
C apparently prefers that programmers use actual control characters rather than naming them via UCNs, even in character and string literals. I know not why, but that is what N3096 6.4.3 (Universal character names) says.


================
Comment at: clang/test/Lexer/char-literal.cpp:73-99
+_Static_assert((unsigned char)u8"\U00000080"[0] == (unsigned char)0xC2, "");
+#ifndef __cplusplus
+// expected-error at -2 {{universal character name refers to a control character}}
+#endif
+_Static_assert((unsigned char)u8"\U00000080"[1] == (unsigned char)0x80, "");
+#ifndef __cplusplus
+// expected-error at -2 {{universal character name refers to a control character}}
----------------
The `unsigned char` casts are to work around conversion issues with (signed) `char` and the change of type to `char8_t` in C++20 vs C++17.


================
Comment at: clang/www/c_status.html:932
       <td><a href="https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2728.htm">N2728</a></td>
-      <td class="unknown" align="center">Unknown</td>
+      <td class="full" align="center">Yes</td>
     </tr>
----------------
As far as I can tell, no changes are needed for Clang to implement N2728; UTF-16 and UTF-32 have been used for `char16_t` and `char32_t` literals since their introduction in C11 and C++11, so there is no specific Clang version to mark as a conformance point.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149098/new/

https://reviews.llvm.org/D149098



More information about the cfe-commits mailing list