[flang-commits] [flang] [flang] Fix UTF-8 minimality checks (PR #159142)

Eugene Epshteyn via flang-commits flang-commits at lists.llvm.org
Tue Sep 16 11:14:58 PDT 2025


================
@@ -158,21 +158,24 @@ DecodedCharacter DecodeRawCharacter<Encoding::UTF_8>(
     const char *cp, std::size_t bytes) {
   auto p{reinterpret_cast<const std::uint8_t *>(cp)};
   char32_t ch{*p};
-  if (ch <= 0x7f) {
+  // Valid UTF-8 encodings must be minimal.
+  if (ch <= 0x7f) { // 1 byte: 7 bits of payload
     return {ch, 1};
-  } else if ((ch & 0xf8) == 0xf0 && bytes >= 4 && ch > 0xf0 &&
-      ((p[1] | p[2] | p[3]) & 0xc0) == 0x80) {
+  } else if ((ch & 0xf8) == 0xf0 && bytes >= 4 &&
+      ((p[1] | p[2] | p[3]) & 0xc0) == 0x80 && (ch > 0xf0 || p[1] > 0x8f)) {
----------------
eugeneepshteyn wrote:

Where do 0x8f and 0x9f come from?

https://github.com/llvm/llvm-project/pull/159142


More information about the flang-commits mailing list