[libcxx-commits] [PATCH] D144346: [libc++][format] Improves Unicode decoders.
Mark de Wever via Phabricator via libcxx-commits
libcxx-commits at lists.llvm.org
Wed Feb 22 08:40:36 PST 2023
Mordante added a comment.
Thanks for the review!
================
Comment at: libcxx/include/__format/unicode.h:150
+ // *Marked* entries are not the full range 80..BF.
+ // *) This entry is not marked in the Unicode standard, but this entry is also not the full range.
+ //
----------------
tahonermann wrote:
> I don't understand this footnote. The full range of code points that are encodeable in a single code unit is U+0000..U+007F.
It seems the * is placed on the wrong line, it should have been at `*C2*..DF 80..BF`.
Based on the encoding scheme that requires the first code unit to start with `110xxxxx`, this allows the values starting from `0xC0`. This value is not marked in the Unicode Standard, but I think it's good to point out. Especially since this decoder doesn't use a nested if statement. Instead it decodes the value and tests whether it's in the valid range. This reduces the number of comparisons. IMO this makes the code easier to read.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D144346/new/
https://reviews.llvm.org/D144346
More information about the libcxx-commits
mailing list