[PATCH] D106577: [clang] Define __STDC_ISO_10646__

Joerg Sonnenberger via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Fri Jul 23 12:37:27 PDT 2021


joerg added a comment.

In D106577#2899715 <https://reviews.llvm.org/D106577#2899715>, @aaron.ballman wrote:

> In D106577#2899711 <https://reviews.llvm.org/D106577#2899711>, @joerg wrote:
>
>> This patch is certainly wrong for NetBSD as the wchar_t encoding is up to the specific locale charset and *not* UCS-2 or UCS-4 for certain legacy encodings like the various shift encodings in East Asia.
>
> How does the value of a macro get impacted by a runtime locale?

NetBSD doesn't set the macro. And yes, this is one of the fundamental design issues of long char literals. Section 2 of the following now 20 year old Itojun paper goes into some of the problems with the assumption of a single universal character set:
https://www.usenix.org/legacy/publications/library/proceedings/usenix01/freenix01/full_papers/hagino/hagino.pdf
Even an encoding that embeds ISO 10646 fully and uses a flag bit to denote values (entirely valid as Unicode is restricted to 21bit) should not get this macro set.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106577/new/

https://reviews.llvm.org/D106577



More information about the cfe-commits mailing list