[libc-commits] [libc] [libc] Implemented CharacterConverter push/pop for utf32->utf8 conversions (PR #143971)

Brooks Moses via libc-commits libc-commits at lists.llvm.org
Fri Jun 13 16:27:04 PDT 2025


================
@@ -22,13 +24,61 @@ bool CharacterConverter::isComplete() {
   return state->bytes_processed == state->total_bytes;
 }
 
-int CharacterConverter::push(char8_t utf8_byte) {}
+int CharacterConverter::push(char32_t utf32) {
+  state->partial = utf32;
+  state->bytes_processed = 0;
+  state->total_bytes = 0;
 
-int CharacterConverter::push(char32_t utf32) {}
+  // determine number of utf-8 bytes needed to represent this utf32 value
+  constexpr char32_t ranges[] = {0x7f, 0x7ff, 0xffff, 0x10ffff};
+  constexpr int num_ranges = 4;
+  for (uint8_t i = 0; i < num_ranges; i++) {
+    if (state->partial <= ranges[i]) {
+      state->total_bytes = i + 1;
+      break;
----------------
brooksmoses wrote:

I think you could clarify the logic here by making this a "return 0", and then if you get to line 41 you know you're in an error state so you can move the "total_bytes = 0" from line 30 down there and remove the "if" check.

That makes it easier to read by not setting total_bytes twice.  (When I was reading this and got to line 30, I was thinking, "Why are we setting total_bytes to 0 when we've pushed in a utf32?)

Also if you do that, it might also be cleaner in the error case to call the "Reset" function that I suggested to Sriya in her PR, and only set the "partial" and "bytes_processed" values to the "good" values here right before the "return 0".

https://github.com/llvm/llvm-project/pull/143971


More information about the libc-commits mailing list