[PATCH] D76291: [Support] Fix formatted_raw_ostream for UTF-8

Hubert Tong via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Mar 17 12:57:13 PDT 2020


hubert.reinterpretcast added inline comments.


================
Comment at: llvm/include/llvm/Support/FormattedStream.h:44
 
+  /// PartialUTF8Char - Either empty or a prefix of a UTF-8 character which
+  /// should be prepended to the buffer for the next call to ComputePosition.
----------------
s/UTF-8 character/UTF-8 code unit sequence for a Unicode scalar value/;


================
Comment at: llvm/include/llvm/Support/FormattedStream.h:47
+  /// This is needed when the buffer is flushed when it ends part-way through a
+  /// UTF-8 character, so that we can compute the display width of the character
+  /// once we have the rest of it.
----------------
s/a UTF-8 character/the UTF-8 encoding of a Unicode scalar value/;


================
Comment at: llvm/lib/Support/FormattedStream.cpp:25
+/// This assumes that the input string is well-formed UTF-8, and takes into
+/// account unicode characters which render as multiple columns wide.
+void formatted_raw_ostream::UpdatePosition(const char *Ptr, size_t Size) {
----------------
s/unicode/Unicode/;


================
Comment at: llvm/unittests/Support/formatted_raw_ostream_test.cpp:88
+
+TEST(formatted_raw_ostreamTest, Test_UTF8) {
+  SmallString<128> A;
----------------
Should there be a test for combining characters?


================
Comment at: llvm/unittests/Support/formatted_raw_ostream_test.cpp:114
+
+  // U+55B5, chinese character, encodes as three bytes, takes up two columns.
+  C << "\u55b5";
----------------
s/chinese/Chinese/; or CJK.


================
Comment at: llvm/unittests/Support/formatted_raw_ostream_test.cpp:147
+
+  // Same as above, but with a chinese character which displays as two columns.
+  C << "123\u55b5";
----------------
Same comment re: CJK.


================
Comment at: llvm/unittests/Support/formatted_raw_ostream_test.cpp:163
+  // The stream has a one-byte buffer, so it gets flushed multiple times while
+  // printing a single unicode character.
+  C << "\u2468";
----------------
Same comment re: "Unicode".


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76291/new/

https://reviews.llvm.org/D76291





More information about the cfe-commits mailing list