[libcxx-commits] [PATCH] D149672: [libc++][format] Fixes UTF-8 continuation.

Tom Honermann via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Fri Jun 2 14:40:56 PDT 2023


tahonermann accepted this revision.
tahonermann added a comment.

The change looks right to me and the added tests look good. I left a couple of suggested edits with comments for you to do with as you please.



================
Comment at: libcxx/test/std/utilities/format/format.functions/escaped_output.unicode.pass.cpp:507-512
+  // http://unicode.org/review/pr-121.html
+  test_format(R"("a\x{f1}\x{80}\x{80}\x{e1}\x{80}\x{c2}b")"sv,
+              "{:?}",
+              "a"
+              "\xf1\x80\x80\xe1\x80\xc2"
+              "b");
----------------
I'm not sure what purpose is served by the reference to PR-121 here. The test appears to demonstrate that code unit values are preserved as escaped values which, I think, is the right behavior according to the C++ standard, but is not a match for any of the PR-121 policies.


================
Comment at: libcxx/test/std/utilities/format/format.functions/unicode.pass.cpp:276-278
+    check(SV("\xc2\x00"), SV("{}"), SV("\xc2\x00")); // 0b0000'0000
+    check(SV("\xc2\x40"), SV("{}"), SV("\xc2\x40")); // 0b0100'0000
+    check(SV("\xc2\xc0"), SV("{}"), SV("\xc2\xc0")); // 0b1100'0000
----------------
I eventually figured out how the mask in the comment correlated with the format string and why each was relevant, but I wouldn't expect many people to understand the intended relevance without a more descriptive comment. I suggest just removing these comments.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D149672/new/

https://reviews.llvm.org/D149672



More information about the libcxx-commits mailing list