[libcxx-commits] [PATCH] D144346: [libc++][format] Improves Unicode decoders.

Mark de Wever via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Fri Feb 24 12:56:10 PST 2023


Mordante marked 3 inline comments as done.
Mordante added inline comments.


================
Comment at: libcxx/include/__format/unicode.h:139-147
+  // U+0000..U+007F     00..7F                                         U+0000..U+007F 1 code unit range
+  // U+0080..U+07FF     *C2*..DF   80..BF                              U+0080..U+07FF 2 code unit range *
+  // U+0800..U+0FFF     E0         *A0*..BF    80..BF                  U+0800..U+FFFF 3 code unit range
+  // U+1000..U+CFFF     E1..EC     80..BF      80..BF
+  // U+D000..U+D7FF     ED         80..*9F*    80..BF                  U+D800..U+DFFF surrogate range
+  // U+E000..U+FFFF     EE..EF     80..BF      80..BF
+  // U+10000..U+3FFFF   F0         *90*..BF    80..BF     80..BF       U+10000..U+10FFFF 4 code unit range
----------------
tahonermann wrote:
> Here is another presentation option that avoids the need for those footnotes. If you like this better, great. If not, no problem. The current presentation has the benefit of matching the bold highlighting in the table from the Unicode Standard, but I think the suggested presentation better explains the reason those invalid ranges exist.
Actually I like this a lot, thanks! This matches the code closer; it does
not validate all ranges, but it rejects the "invalid overlong encoding"

I made a few more changes in the surrounding comments, since they looked odd with the new table.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144346/new/

https://reviews.llvm.org/D144346



More information about the libcxx-commits mailing list