[PATCH] D106577: [clang] Define __STDC_ISO_10646__

Corentin Jabot via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Jul 26 12:58:43 PDT 2021


cor3ntin added a comment.

In D106577#2905027 <https://reviews.llvm.org/D106577#2905027>, @aaron.ballman wrote:

>> That doesn't help the fact that wide char literals are effectively broken on those OSes, but oh well. Maybe someday they'll decide to switch to a consistent/documented wchar encoding, at which point clang can emit that (whatever it is). Or maybe someone will teach clang to emit an error or warning when using wide char literals on such targets. But I wouldn't hold my breath for either of those outcomes, and it seems fine to move forward here simply by exempting the known-to-be-problematic OSes.
>
> I still don't fully understand the original comment from Joerg. The encoding of `L'a'` cannot change at runtime; it's a literal whose encoding is decided entirely at compile time. @joerg -- did you mean that Clang produces a different literal encoding depending on the environment the host compiler is running in?

Exactly. Unfortunately, this is a problem people have a tendency to ignore.

Any string literals (narrow and wide) that cannot be interpreted the same way at compile time and runtime will lead to mojibake or bugs.
If we admit that the encoding can change between execution, or even during the same execution (gasp!), then clang should probably reject string literals that contain characters that are not represented identically in all the possible encodings that can be set at runtime.
(which would end up in the best case being a subset of the basic latin 1 block, or of we admit JIS... probably the empty set!)


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106577/new/

https://reviews.llvm.org/D106577



More information about the cfe-commits mailing list