[PATCH] D107202: ConvertUTF: convertUTF32ToUTF8String
Marcus Johnson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 3 20:53:21 PDT 2021
MarcusJohnson91 added a comment.
> In D106753#inline-1020607 <https://reviews.llvm.org/D106753#inline-1020607>, @efriedma wrote:
>
>> I'm not sure the math is right even for UTF-16, but anyway, UTF-32 is a little different from UTF-16. A 2-byte character in UTF-16 can translate to 3 bytes in UTF-8. That sort of thing is impossible in UTF-32: a UTF-32 string is never shorter than its translation to UTF-8. A codepoint in UTF-8 is at most 4 bytes.
I've written my own Unicode encoder/decoder before, I'm familiar with how it works.
You can store regular ASCII in a UTF-32 string, like "Example" as UTF-32 would be 7 * 4 = 28 bytes (not counting the null terminator), where as it would just be 7 bytes in UTF-8.
and it looks like the std::string is being compacted afterwards with `Out.resize(reinterpret_cast<char *>(Dst) - &Out[0]);`
but maybe a call to `Out.shrink_to_fit()` at the end is warranted?
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D107202/new/
https://reviews.llvm.org/D107202
More information about the llvm-commits
mailing list