[cfe-dev] How do i properly address Clang's UTF-8 string literal warning?

Richard Smith via cfe-dev cfe-dev at lists.llvm.org
Sun Nov 7 22:07:38 PST 2021


On Sun, 7 Nov 2021 at 07:29, Aleksandr Medvedev via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Hello, folks! (And sorry in advance if i'm writing to the wrong address
> with my question.)
>
> Recently i've come across a clang warning next to this simple line of code:
>
> auto message = u8"Текст";
>
> which says as follows:
>
>> type of UTF-8 string literal will change from array of const char to
>> array of const char8_t in C++20
>
>
> That is a warning which comes with  -W-c++20-compat flag
> <https://clang.llvm.org/docs/DiagnosticsReference.html#wc-20-compat> and
> the answer I'm looking for is how to properly make this warning
> disappear (without suppressing or disabling it).
>

Why do you want this warning enabled? Depending on why you want this
warning to appear, the answer for how to handle it will be different.


> I tried a few approaches from the P1423R2
> <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r2.html#remediation> document,
> including:
>
>    - explicit conversion function approach
>    <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r2.html#conversion_fns>
>    - emulation with macroses
>    <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r2.html#emulate>
>    - reinterpret_cast u8 literals to char
>    <http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r2.html#reinterpret_cast>
>
> Unfortunately neither seems to work. The warning persists probably just
> because of the fact of having u8 literal no matter whether i have
> overloaded functions, wrap it with macroses or try to cast it to "standard"
> char. I wonder if i'm missing something? Or the only suitable solution
> for this warning to vanish is to not use u8 literals in my code?
>

The diagnostic also says:

<source>:1:16: note: remove 'u8' prefix to avoid a change of behavior;
Clang encodes unprefixed narrow string literals as UTF-8

... so that's one potential option to ensure the code doesn't change
meaning in C++20 mode, if you don't need portability to compilers that
don't assume UTF-8.

Another option is to pre-adopt this C++20 feature with -fchar8_t. That
again is non-portable, but is available in at least Clang and -- if memory
serves -- GCC.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20211107/ee833c58/attachment.html>


More information about the cfe-dev mailing list