[libcxx-commits] [PATCH] D144346: [libc++][format] Improves Unicode decoders.
Mark de Wever via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Fri Feb 24 12:56:10 PST 2023
Mordante marked 3 inline comments as done.
Mordante added inline comments.
================
Comment at: libcxx/include/__format/unicode.h:139-147
+ // U+0000..U+007F 00..7F U+0000..U+007F 1 code unit range
+ // U+0080..U+07FF *C2*..DF 80..BF U+0080..U+07FF 2 code unit range *
+ // U+0800..U+0FFF E0 *A0*..BF 80..BF U+0800..U+FFFF 3 code unit range
+ // U+1000..U+CFFF E1..EC 80..BF 80..BF
+ // U+D000..U+D7FF ED 80..*9F* 80..BF U+D800..U+DFFF surrogate range
+ // U+E000..U+FFFF EE..EF 80..BF 80..BF
+ // U+10000..U+3FFFF F0 *90*..BF 80..BF 80..BF U+10000..U+10FFFF 4 code unit range
----------------
tahonermann wrote:
> Here is another presentation option that avoids the need for those footnotes. If you like this better, great. If not, no problem. The current presentation has the benefit of matching the bold highlighting in the table from the Unicode Standard, but I think the suggested presentation better explains the reason those invalid ranges exist.
Actually I like this a lot, thanks! This matches the code closer; it does
not validate all ranges, but it rejects the "invalid overlong encoding"
I made a few more changes in the surrounding comments, since they looked odd with the new table.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D144346/new/
https://reviews.llvm.org/D144346
More information about the libcxx-commits
mailing list