[flang-commits] [flang] [flang] Fix UTF-8 minimality checks (PR #159142)
Eugene Epshteyn via flang-commits
flang-commits at lists.llvm.org
Tue Sep 16 11:14:58 PDT 2025
================
@@ -158,21 +158,24 @@ DecodedCharacter DecodeRawCharacter<Encoding::UTF_8>(
const char *cp, std::size_t bytes) {
auto p{reinterpret_cast<const std::uint8_t *>(cp)};
char32_t ch{*p};
- if (ch <= 0x7f) {
+ // Valid UTF-8 encodings must be minimal.
+ if (ch <= 0x7f) { // 1 byte: 7 bits of payload
return {ch, 1};
- } else if ((ch & 0xf8) == 0xf0 && bytes >= 4 && ch > 0xf0 &&
- ((p[1] | p[2] | p[3]) & 0xc0) == 0x80) {
+ } else if ((ch & 0xf8) == 0xf0 && bytes >= 4 &&
+ ((p[1] | p[2] | p[3]) & 0xc0) == 0x80 && (ch > 0xf0 || p[1] > 0x8f)) {
----------------
eugeneepshteyn wrote:
Where do 0x8f and 0x9f come from?
https://github.com/llvm/llvm-project/pull/159142
More information about the flang-commits
mailing list