[clang] [libcxx] [llvm] [Clang] Add warnings when mixing different charN_t types (PR #138708)

James Y Knight via cfe-commits cfe-commits at lists.llvm.org
Fri May 30 07:35:58 PDT 2025


jyknight wrote:

I like the idea of this warning, but I'm afraid the diagnostic wording isn't sufficient to result in correct fixes to code. Instead, it seems to result in simply adding explicit casts to make the compiler shut up. Even from people who know what they're doing w.r.t. Unicode.

The first response I got in a discussion about an instance of `implicit conversion from char16_t to char32_t may change the meaning of the represented code unit`, was (approximately) "What an obnoxious warning, _of course it's fine_ to zero-extend a char16_t codepoint to a char32_t codepoint!" This, from an subject matter expert, maintainer of a unicode library.

And, of course, it _is_ fine if you happen to know that the char16_t was representing a valid codepoint that happens to be limited to under 64K. Which..._could_ be the case...it's just not common. And, worse, if it is true in a given case, then the API in question is dangerous and invites misuse by its callers, because it has decided upon an an unusual/unexpected use of types (char16_t as a codepoint, instead of the expected use of char16_t as a UTF-16 code-unit).

So, I think that we need to somehow explain in these diagnostics -- in very few words! -- that char16_t should represent UTF-16 code-units, while char32_t represents unicode codepoints, and that you _probably_ need to refactor your code to decode a sequence of UTF-16 char16_t into char32_t codepoints, rather than simply insert an explicit cast of an individual char16_t.

https://github.com/llvm/llvm-project/pull/138708


More information about the cfe-commits mailing list