[PATCH] D106577: [clang] Define __STDC_ISO_10646__

Corentin Jabot via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jul 23 03:06:24 PDT 2021


cor3ntin added a comment.

In D106577#2898967 <https://reviews.llvm.org/D106577#2898967>, @hubert.reinterpretcast wrote:

> In D106577#2897588 <https://reviews.llvm.org/D106577#2897588>, @aaron.ballman wrote:
>
>> In D106577#2897522 <https://reviews.llvm.org/D106577#2897522>, @jyknight wrote:
>>
>>> I'm not sure we should be populating this.
>>>
>>> The _value_ is determined by what libc supports, so it probably needs to be left up to libc to define it.
>>
>> Why is the value determined by what libc supports? The definition from the standard is:
>>
>>   If this symbol is defined, then every character in the Unicode required set, when stored in an
>>   object of type wchar_t, has the same value as the short identifier of that character.
>>
>> That doesn't seem to imply anything about the library, just the size of `wchar_t`.
>
> Every character in the Unicode required set encoded in what way? To say that such a character is stored in an object of type `wchar_t` means that interpreting the `wchar_t` yields that stored character. Methods to determine the interpretation of the stored `wchar_t` value include locale-sensitive functions such as `wcstombs` (and thus is tied to libc).

"has the same value as the short identifier of that character." implies UTF-32.
There is no mention of interpretation here, the *value* is the same. As in, when casting to an integer type you get the code point value.
*Storing* that value might involve either assigning from a wide-character literal or `mbrtowc`.
Both methods imply some transcoding,  the latter of which could be affected by locale such that it would store a different character, but again, is it related to this wording?

Note that by virtue of being a macro this cannot possibly be affected by locale.

A few scenarios

- The encoding of wide literal as determined by clang is not utf-32, the macro should be defined by neither the compiler nor the library
- The encoding of wide literals as determined by the compiler is utf-32, libc agrees... this works as intended
- The encoding of wide literals as determined by the compiler is utf-32, libc disagrees... nothing good can come of that.

The compiler and the libc have to agree here.
The library cannot (should not) define this macro without knowing the wide literal encoding.

Note that both standards imply that these macros should be defined when relevant independently of the environment which includes hosted and non-Linux+glibc platforms. So relying on a specific glibc implementation
seems dubious. Especially as glibc will *always* define that macro

Now, I agree that the compiler and the library should ideally expose the same *value* for this macro (although I struggle to find code that actually relies on the value)

When D34158 <https://reviews.llvm.org/D34158> as mentioned by @jyknight lands, the value will be set to that of the library version thereby overriding the compiler default.
On other systems, the value will be set to the library version whenever the library is included.

When we add support for non-utf wide execution encoding, we can use that information to not define this macro.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106577/new/

https://reviews.llvm.org/D106577



More information about the cfe-commits mailing list