[libc-commits] [PATCH] D74021: Created uChar implementation for libc

Marcus Johnson via Phabricator via libc-commits libc-commits at lists.llvm.org
Wed Mar 11 22:33:37 PDT 2020


MarcusJohnson91 added a comment.

Bump



================
Comment at: libc/src/uchar/c16rtomb.cpp:16
+
+size_t LLVM_LIBC_ENTRYPOINT(c16rtomb)(char *restrict s, char16_t c16,
+                                      mbstate_t *restrict ps) {
----------------
sivachandra wrote:
> It would be good if you can add comments explaining the different cases that being handled here.
0xD800 - 0xDFFF is a surrogate pair, the c16 parameter can not contain the 2 16 bit code units required to actually decode it to the proper UTF-32 codepoint.

so should I try to make the decoder stateful in order to make that happen (not sure if that's even possible), or should I just replace it with the invalid replacement character?


================
Comment at: libc/src/uchar/c16rtomb.cpp:25
+    StringSize = 1;
+    s = new char(StringSize);
+    s[0] = c16 & 0x7F;
----------------
abrachet wrote:
> For what its worth, this allocates one char and initializes it to StringSize. If you wanted to allocate StringSize chars you need `new char[StringSize]`. But `new` isn't safe to use for us, and I think it wouldn't even link because we have `-nostdlib` specified.
> 
> 
> ```lang=C
> void changeParam(int a) {
>     a = 5;
> }
> 
> int main() {
>    int a = 0;
>    changeParam(a);
>    assert(!a); // passes, a not changed.
> }
> ```
> The same is true with pointers if I pass my `char *` to c16rtomb, if you reassign to it by malloc or new, it changes the local variable but not the callers variable. These functions (unless passed a nullptr) assume that `s` is already pointing to enough valid memory.
Ahh, I appreciate the tip on new, I'm new to C++.

As for if the string is allocated I'm trying to look up some examples of how these APIs are called on github, it'll be a lot harder to verify if a string can hold a codepoint/codeunit if it's pre-allocated, because each index would look like a null terminater to begin with, right?


================
Comment at: libc/src/uchar/mbrtoc32.cpp:17
+        size_t StringSize  = 1ULL;
+        char32_t Decoded   = calloc(1, sizeof(char32_t));
+        
----------------
abrachet wrote:
> What's the point of this?
What's the point of what? 
the char32_t variable being initialized as an array instead of a plain value? 
The standard requires it, I agree it's dumb, but that's the API callers expect.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74021/new/

https://reviews.llvm.org/D74021





More information about the libc-commits mailing list