[PATCH] D76291: [Support] Fix formatted_raw_ostream for UTF-8

Hubert Tong via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Wed Mar 18 08:42:09 PDT 2020


hubert.reinterpretcast added inline comments.


================
Comment at: llvm/lib/Support/FormattedStream.cpp:59
+    size_t NumBytes = getNumBytesForUTF8(PartialUTF8Char[0]);
+    if (NumBytes > (PartialUTF8Char.size() + Size)) {
+      // If we still don't have enough bytes for a complete code point, just
----------------
The overflow-free version should be preferred: `Size < NumBytes - PartialUTF8Char.size()`

Considering that `NumBytes - PartialUTF8Char.size()` is already used in the `else`, this might as well be named `NumBytesNeededFromBuffer` first (at which point it would be the only use of `NumBytes`, so we can get rid of `NumBytes`):

```
      size_t NumBytesNeededFromBuffer =
          getNumBytesForUTF8(PartialUTF8Char[0]) - PartialUTF8Char.size();
      if (Size < NumBytesNeededFromBuffer) {
```


================
Comment at: llvm/lib/Support/FormattedStream.cpp:87
+    // even if the buffer is being flushed.
+    if ((Ptr + NumBytes) > End) {
+      PartialUTF8Char = StringRef(Ptr, End - Ptr);
----------------
Prefer the overflow-free version:  `End - Ptr < NumBytes`. If inclined to name `End - Ptr` (it would occur twice), `BytesAvailable` makes sense.


================
Comment at: llvm/unittests/Support/formatted_raw_ostream_test.cpp:88
+
+TEST(formatted_raw_ostreamTest, Test_UTF8) {
+  SmallString<128> A;
----------------
ostannard wrote:
> hubert.reinterpretcast wrote:
> > Should there be a test for combining characters?
> This doesn't support combining characters, I don't think there's much point in adding a test for something which doesn't work.
Got it; thanks.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76291/new/

https://reviews.llvm.org/D76291





More information about the llvm-commits mailing list