[PATCH] D107202: ConvertUTF: convertUTF32ToUTF8String

Marcus Johnson via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Aug 3 20:53:21 PDT 2021


MarcusJohnson91 added a comment.

> In D106753#inline-1020607 <https://reviews.llvm.org/D106753#inline-1020607>, @efriedma wrote:
>
>> I'm not sure the math is right even for UTF-16, but anyway, UTF-32 is a little different from UTF-16.  A 2-byte character in UTF-16 can translate to 3 bytes in UTF-8.  That sort of thing is impossible in UTF-32: a UTF-32 string is never shorter than its translation to UTF-8.  A codepoint in UTF-8 is at most 4 bytes.

I've written my own Unicode encoder/decoder before, I'm familiar with how it works.

You can store regular ASCII in a UTF-32 string, like "Example" as UTF-32 would be 7 * 4 = 28 bytes (not counting the null terminator), where as it would just be 7 bytes in UTF-8.

and it looks like the std::string is being compacted afterwards with `Out.resize(reinterpret_cast<char *>(Dst) - &Out[0]);`

but maybe a call to `Out.shrink_to_fit()` at the end is warranted?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D107202/new/

https://reviews.llvm.org/D107202



More information about the llvm-commits mailing list