[PATCH] D154290: [Clang] Implement P2741R3 - user-generated static_assert messages

Corentin Jabot via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Sat Jul 8 13:34:35 PDT 2023


cor3ntin added a comment.

In D154290#4482975 <https://reviews.llvm.org/D154290#4482975>, @barannikov88 wrote:

> According to the current wording, the static_assert-message is either unevaluated string or an expression evaluated at compile time.
> Unevaluated strings don't allow certain escape sequences, but if I wrap the string in a string_view-like class, I'm allowed to use any escape sequeces, including '\x'.
> Moreover, wrapping a string in a class would change its encoding. Unevaluated strings are displayed as written in the source (that is, UTF-8), while wrapped strings undergo conversion to execution encoding (e.g. EBCDIC) and then printed in system locale, leading to mojibake.

Not quite.
Unevaluated strings are always UTF-8 ( regardless of source file encoding). Evaluated strings are in the literal encoding which is always UTF-8 for clang. 
This will change whenever we allow for different kinds of literal encodings per  this RFC https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512/1

If and when that is the case we will have to convert back to UTF-8 before displaying - and then maybe convert back to the system locale depending on host.
Numeric escape sequences can then occur in evaluated strings and produce mojibake if the evaluated strings is not valid in the string literal encoding.
I don't believe that we would want to output static messages without conversion on any system as the diagnostics framework is very much geared towards UTF-8 and we want to keep supporting cross compilation.

So the process will be
source -> utf8 -> literal encoding -> utf8 -> terminal encoding.

By the same account, casting 0-extended utf-8 to char is fine until such time clang support more than UTF-8. (which is one of the reasons we need to make sure clang conversions utilities can convert from and to utf-8)

Unevaluated strings were introduced in part to help identify what gets converted and what does not.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D154290/new/

https://reviews.llvm.org/D154290



More information about the cfe-commits mailing list