[PATCH] D107202: ConvertUTF: convertUTF32ToUTF8String
Eli Friedman via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Aug 3 11:01:28 PDT 2021
efriedma added a comment.
In D107202#2921107 <https://reviews.llvm.org/D107202#2921107>, @MarcusJohnson91 wrote:
> What BOM handling? there is no BOM function, bytes are swapped in the converter if the byte order isn't correct, is that what you mean?
I mean the behavior handling strings that contain UNI_UTF32_BYTE_ORDER_MARK_SWAPPED.
I suspect a lot of places don't want the BOM handling to trigger. This includes trying to print diagnostics for wprintf, since the underlying function doesn't have any BOM handling. But I guess it's unlikely to matter in practice.
In D107202#2921107 <https://reviews.llvm.org/D107202#2921107>, @MarcusJohnson91 wrote:
> I copied `SrcBytes.size() * UNI_MAX_UTF8_BYTES_PER_CODE_POINT + 1` from the UTF-16 version.
>
> Are you asking me to change the UTF-16 version too?
In D106753#inline-1020607 <https://reviews.llvm.org/D106753#inline-1020607>, @efriedma wrote:
> I'm not sure the math is right even for UTF-16, but anyway, UTF-32 is a little different from UTF-16. A 2-byte character in UTF-16 can translate to 3 bytes in UTF-8. That sort of thing is impossible in UTF-32: a UTF-32 string is never shorter than its translation to UTF-8. A codepoint in UTF-8 is at most 4 bytes.
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D107202/new/
https://reviews.llvm.org/D107202
More information about the llvm-commits
mailing list