[PATCH] D76291: [Support] Fix formatted_raw_ostream for UTF-8
Hubert Tong via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Tue Mar 17 12:57:13 PDT 2020
hubert.reinterpretcast added inline comments.
================
Comment at: llvm/include/llvm/Support/FormattedStream.h:44
+ /// PartialUTF8Char - Either empty or a prefix of a UTF-8 character which
+ /// should be prepended to the buffer for the next call to ComputePosition.
----------------
s/UTF-8 character/UTF-8 code unit sequence for a Unicode scalar value/;
================
Comment at: llvm/include/llvm/Support/FormattedStream.h:47
+ /// This is needed when the buffer is flushed when it ends part-way through a
+ /// UTF-8 character, so that we can compute the display width of the character
+ /// once we have the rest of it.
----------------
s/a UTF-8 character/the UTF-8 encoding of a Unicode scalar value/;
================
Comment at: llvm/lib/Support/FormattedStream.cpp:25
+/// This assumes that the input string is well-formed UTF-8, and takes into
+/// account unicode characters which render as multiple columns wide.
+void formatted_raw_ostream::UpdatePosition(const char *Ptr, size_t Size) {
----------------
s/unicode/Unicode/;
================
Comment at: llvm/unittests/Support/formatted_raw_ostream_test.cpp:88
+
+TEST(formatted_raw_ostreamTest, Test_UTF8) {
+ SmallString<128> A;
----------------
Should there be a test for combining characters?
================
Comment at: llvm/unittests/Support/formatted_raw_ostream_test.cpp:114
+
+ // U+55B5, chinese character, encodes as three bytes, takes up two columns.
+ C << "\u55b5";
----------------
s/chinese/Chinese/; or CJK.
================
Comment at: llvm/unittests/Support/formatted_raw_ostream_test.cpp:147
+
+ // Same as above, but with a chinese character which displays as two columns.
+ C << "123\u55b5";
----------------
Same comment re: CJK.
================
Comment at: llvm/unittests/Support/formatted_raw_ostream_test.cpp:163
+ // The stream has a one-byte buffer, so it gets flushed multiple times while
+ // printing a single unicode character.
+ C << "\u2468";
----------------
Same comment re: "Unicode".
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D76291/new/
https://reviews.llvm.org/D76291
More information about the cfe-commits
mailing list