[libc-commits] [libc] [libc] Implemented CharacterConverter push/pop for utf32->utf8 conversions (PR #143971)
Michael Jones via libc-commits
libc-commits at lists.llvm.org
Fri Jun 13 09:56:39 PDT 2025
================
@@ -22,13 +23,53 @@ bool CharacterConverter::isComplete() {
return state->bytes_processed == state->total_bytes;
}
-int CharacterConverter::push(char8_t utf8_byte) {}
+int CharacterConverter::push(char32_t utf32) {
+ state->partial = utf32;
+ state->bytes_processed = 0;
+ state->total_bytes = 0;
-int CharacterConverter::push(char32_t utf32) {}
+ // determine number of utf-8 bytes needed to represent this utf32 value
+ char32_t ranges[] = {0x7f, 0x7ff, 0xffff, 0x10ffff};
+ const int num_ranges = 4;
+ for (uint8_t i = 0; i < num_ranges; i++) {
+ if (state->partial <= ranges[i]) {
+ state->total_bytes = i + 1;
+ break;
+ }
+ }
+ if (state->total_bytes == 0)
+ return -1;
-utf_ret<char8_t> CharacterConverter::pop_utf8() {}
+ return 0;
+}
+
+ErrorOr<char8_t> CharacterConverter::pop_utf8() {
+ if (state->bytes_processed >= state->total_bytes)
+ return Error(-1);
+
+ const char8_t first_byte_headers[] = {0, 0xC0, 0xE0, 0xF0};
+ const char32_t utf32 = state->partial;
+ const char32_t tot_bytes = state->total_bytes;
+ const char32_t bytes_proc = state->bytes_processed;
----------------
michaelrj-google wrote:
I'm not sure why these variables are defined. If you used the raw parts of `state` the code wouldn't be any harder to read.
https://github.com/llvm/llvm-project/pull/143971
More information about the libc-commits
mailing list