[PATCH] D114342: ConvertUTF, new wrapper API
Marcus Johnson via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Mar 21 15:09:31 PDT 2022
MarcusJohnson91 added inline comments.
================
Comment at: llvm/lib/Support/ConvertUTFWrapper.cpp:172
+ // enough that we can fit a null terminator without reallocating.
+ Out.resize(SrcBytes.size() + 1);
+ UTF8 *Dst = reinterpret_cast<UTF8 *>(&Out[0]);
----------------
cor3ntin wrote:
> Bigcheese wrote:
> > This is technically correct, but it's implicit in that the max number of UTF8 code units per code point is the same as `sizeof(UTF32)`. Would be nice to have a comment.
> Nit: The comment still doesn't say that we assume there can only be 4 bytes per utf-8 code units - which would not be the case if the utf-8 comes for non-iso10646 conforming android environments for example
I was confused by this, Unicode limits Codepoints to 0x10FFFF so the maximum number of UTF-8 codeunits is 4.
I mean, I can still put the comment in, but it seems pointless?
https://www.unicode.org/versions/Unicode14.0.0/ch03.pdf#I1.36559
This limit, of 0x10FFFF has been in place since the year 2000
https://www.unicode.org/L2/L2000/00079-n2175.htm
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D114342/new/
https://reviews.llvm.org/D114342
More information about the llvm-commits
mailing list