[PATCH] D154290: [Clang] Implement P2741R3 - user-generated static_assert messages

Sat Jul 8 15:57:03 PDT 2023

barannikov88 added a comment.

In D154290#4483055 <https://reviews.llvm.org/D154290#4483055>, @cor3ntin wrote:

> In D154290#4482975 <https://reviews.llvm.org/D154290#4482975>, @barannikov88 wrote:
>
>> According to the current wording, the static_assert-message is either unevaluated string or an expression evaluated at compile time.
>> Unevaluated strings don't allow certain escape sequences, but if I wrap the string in a string_view-like class, I'm allowed to use any escape sequeces, including '\x'.
>> Moreover, wrapping a string in a class would change its encoding. Unevaluated strings are displayed as written in the source (that is, UTF-8), while wrapped strings undergo conversion to execution encoding (e.g. EBCDIC) and then printed in system locale, leading to mojibake.
>
> Not quite.
> Unevaluated strings are always UTF-8 ( regardless of source file encoding). Evaluated strings are in the literal encoding which is always UTF-8 for clang. 
> This will change whenever we allow for different kinds of literal encodings per  this RFC https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512/1
>
> If and when that is the case we will have to convert back to UTF-8 before displaying - and then maybe convert back to the system locale depending on host.
> Numeric escape sequences can then occur in evaluated strings and produce mojibake if the evaluated strings is not valid in the string literal encoding.
> I don't believe that we would want to output static messages without conversion on any system as the diagnostics framework is very much geared towards UTF-8 and we want to keep supporting cross compilation.
>
> So the process will be
> source -> utf8 -> literal encoding -> utf8 -> terminal encoding.

Thanks for your reply, I think I see the idea.

> By the same account, casting 0-extended utf-8 to char is fine until such time clang support more than UTF-8. (which is one of the reasons we need to make sure clang conversions utilities can convert from and to utf-8)
>
> Unevaluated strings were introduced in part to help identify what gets converted and what does not.

It is a bit strange that the string in `static_assert(false, "й")` is not converted, while it is converted in `static_assert(false, std::string_view("й"))`.
It might be possible to achieve identical diagnostic output even with -fexec-charset supported (which would only affect the second form),
but right now I'm confused by the distinction… Why don't always evaluate the message?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154290/new/

https://reviews.llvm.org/D154290