[PATCH] D106577: [clang] Define __STDC_ISO_10646__

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Jul 27 00:10:28 PDT 2021


rsmith added a comment.

In D106577#2905027 <https://reviews.llvm.org/D106577#2905027>, @aaron.ballman wrote:

> In D106577#2904960 <https://reviews.llvm.org/D106577#2904960>, @rsmith wrote:
>
>> I'd expect we will break at least things like libc++ tests that build with `-Wsystem-headers` in the case where libc defines the macro. But if the problem is limited to old libc and a rare use case, and can easily be worked around with a `-Wno` flag, that's probably fine.
>
> Doesn't look to be in an old libc: https://sourceware.org/git/?p=glibc.git;a=blob;f=include/stdc-predef.h;h=e130c462a7b71f87956a6d4b2bd69a38c9d9cb32;hb=refs/heads/master#l58

The "old libc" case is for old versions of glibc that put the macro in `features.h` instead of in `stdc-predef.h`. The macros in `stdc-predef.h` aren't a problem until / unless we start auto-including that header.

In D106577#2905755 <https://reviews.llvm.org/D106577#2905755>, @jyknight wrote:

> In D106577#2904960 <https://reviews.llvm.org/D106577#2904960>, @rsmith wrote:
>
>> One benefit we don't get with this approach is providing the right value for the macro (without paying the cost of always including `stdc-predefs.h`).
>
> What do you mean by "right value", though? As Aaron pointed out, the value seems only dependent upon what characters can fit into a wchar_t, which is independent of what unicode version the libc supports.

I don't see how that follows from the definition in the C standard; it says "every character in the Unicode required set, when stored in an object of type `wchar_t`, has the same value as the short identifier of that character". This doesn't say anything about character or string literals, and for example `mbstowcs` stores characters in objects of type `wchar_t` too (it "stores not more than `n` wide characters into the array pointed to by `pwcs`"), so unless `mbstowcs` does the right thing I don't see how we can claim support for a new Unicode standard version. As far as I can tell, this macro is documenting a property of the complete implementation (compiler plus standard library), and should be set to the minimum of the version supported by the compiler and the version supported by the stdlib. I think it's OK for the compiler to say it supports *any* version, though, because we don't expect future Unicode versions to require any changes on our part. But they may require standard library changes.

If Aaron's checked with WG14 and the intent is for this to only constrain how literals are represented, and not the complete implementation, then I'm entirely fine with us defining the macro ourselves. But that's not the interpretation that several other vendors have taken. If we're confident that the intent is just that this macro lists (effectively) the latest version of the Unicode standard that we've heard of, we should let the various libc vendors that currently define the macro know that they're doing it wrong and the definition belongs in the compiler.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106577/new/

https://reviews.llvm.org/D106577



More information about the cfe-commits mailing list