[libcxx-commits] [PATCH] D144346: [libc++][format] Improves Unicode decoders.

Tom Honermann via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Tue Feb 21 15:28:52 PST 2023


tahonermann added inline comments.


================
Comment at: libcxx/include/__format/unicode.h:139-147
+  // U+0000..U+007F     00..7F
+  // U+0080..U+07FF     *C2*..DF   80..BF                              U+0000..U0+007F 1 code unit range*
+  // U+0800..U+0FFF     E0         *A0*..BF    80..BF                  U+0000..U+07FFF 1 and 2 code unit range
+  // U+1000..U+CFFF     E1..EC     80..BF      80..BF
+  // U+D000..U+D7FF     ED         80..*9F*    80..BF                  U+D800..D+DFFFF surrogate range
+  // U+E000..U+FFFF     EE..EF     80..BF      80..BF
+  // U+10000..U+3FFFF   F0         *90*..BF    80..BF     80..BF       U+0000..U+FFFF 1, 2, and 3 code unit range
----------------
I corrected several of the `U+XXXX` identifiers in the suggested edit. I also aligned the remarks with the rows that I think they better correspond to.


================
Comment at: libcxx/include/__format/unicode.h:150
+  // *Marked* entries are not the full range 80..BF.
+  // *) This entry is not marked in the Unicode standard, but this entry is also not the full range.
+  //
----------------
I don't understand this footnote. The full range of code points that are encodeable in a single code unit is U+0000..U+007F.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144346/new/

https://reviews.llvm.org/D144346



More information about the libcxx-commits mailing list