[PATCH] D106577: [clang] Define __STDC_ISO_10646__

James Y Knight via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 26 14:56:25 PDT 2021


jyknight added a comment.

In D106577#2904960 <https://reviews.llvm.org/D106577#2904960>, @rsmith wrote:

> One benefit we don't get with this approach is providing the right value for the macro (without paying the cost of always including `stdc-predefs.h`).

What do you mean by "right value", though? As Aaron pointed out, the value seems only dependent upon what characters can fit into a wchar_t, which is independent of what unicode version the libc supports. If ISO10646 defines a new character, you can store that into a wchar_t, and, say, decode/encode to utf-8 without a libc update. So that the exact value doesn't much matter for a 32-bit wchar_t, so long as ISO10646 doesn't expand the size of a character beyond 32 bits. (Which they won't -- it's stuck at 21-bits effectively permanently.)

> AFAICS, the only possible use for the value of the macro is to detect libc support, so having Clang pick a specific value seems wrong to me. In some ways I'd be more comfortable with this patch if we defined the macro to `1` and documented that we think WG14 was wrong to ask for a version number.

At this point, there are 3 versions of ISO10646 that changed properties relevant to this:

- the initial ISO/IEC 10646-1:1993 allows characters as being potentially up through 0x7FFFFFFF, but only defined characters up through 0xFFFF.
- ISO/IEC 10646-2:2001 first actually defined characters beyond 0xFFFF,
- and then ISO/IEC 10646:2012 and later versions cut the maximum character value down to 0x10FFFF.

So it's not true that the version number is without meaning -- only that it doesn't matter much anymore, because things have settled down. Quite possibly when they first defined this, they expected that toolchains with a 16-bit wchar_t might set it the define, since the standard -- at that point -- didn't have characters beyond 0xffff. But if those characters were indeed defined in a yet-to-be-released standard, then you'd have a problem. (As is the case today.)

And also, I think it'd be valid to `#define __STDC_ISO_10646__ 200009L` for 16bit wchar_t platforms. (Not sure if we //should//, but that would appear to be valid).

In D106577#2905027 <https://reviews.llvm.org/D106577#2905027>, @aaron.ballman wrote:

> Yeah, I'm hoping to hear what WG14 has to say on this. My original thinking was that this macro is used to tell users and libc what version of Unicode wchar_t literal values are encoded in (if any), but seeing that both glibc and musl (https://git.musl-libc.org/cgit/musl/tree/include/stdc-predef.h#n4) define this macro themselves, I am less certain.

Musl has the define simply because GCC does not. That's not an independent confirmation of anything, simply it following the status-quo set by the initial choice of GCC in 2000 not to define the macro itself.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106577/new/

https://reviews.llvm.org/D106577



More information about the cfe-commits mailing list