[PATCH] D106577: [clang] Define __STDC_ISO_10646__

James Y Knight via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jul 23 11:48:42 PDT 2021


jyknight added a comment.

Even after the more recent discussion, I still think my initial message was incorrect, and that the compiler should be defining this macro itself, as proposed in this patch. Note that my confusion was not that the macro being defined or not was dependent on libc behavior, only the precise value it should be defined to.

Responding to a couple points:

> I think the point was more about "who is generally responsible for defining this macro, the compiler or the library" as opposed to it being a glibc thing specifically. I notice that musl also defines the macro (https://git.musl-libc.org/cgit/musl/tree/include/stdc-predef.h#n4).

Exactly so. *IF* this macro relates to library behavior, then libraries should define it -- and not just glibc. Other systems could/should provide a stdc-predef.h file as well. (but per above, I don't think this is the case here.)

> This patch is certainly wrong for NetBSD as the wchar_t encoding is up to the specific locale charset and *not* UCS-2 or UCS-4 for certain legacy encodings like the various shift encodings in East Asia.

Yet, the compiler currently always puts UTF-16/UTF-32 in wchar_t string literals. If that is inconsistent with the runtime, then the system as a whole currently has a serious bug. There is currently no platform that Clang uses a non-UTF encoding for wchar_t for. If there were some such platform, it would then be correct to not define this macro for that platform. There's no getting away from the compiler needing to be aware of the encoding of wchar_t, independent from this patch, so there's no point in punting the definition of the macro to the libc.

Now, maybe FreeBSD should be such a platform that uses a different wchar_t encoding...which leads to the question: what //is// the encoding Clang should be using here? What *should* `L"\U00100000"` emit? It sounds like wchar_t doesn't even have a consistent encoding at runtime, which implies that there's no way the compiler can create a correct wchar_t string literal. So maybe it should simply throw a compilation error if you try to use L"" or L''?

Per https://www.gnu.org/software/libunistring/manual/html_node/The-wchar_005ft-mess.html this same bug also exists for Solaris.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106577/new/

https://reviews.llvm.org/D106577



More information about the cfe-commits mailing list