[llvm] bf45e27 - [Clang] Fix invalid utf-8 detection

Corentin Jabot via llvm-commits llvm-commits at lists.llvm.org
Wed Jul 6 13:20:09 PDT 2022


Author: Corentin Jabot
Date: 2022-07-06T22:20:04+02:00
New Revision: bf45e27a676d87944f1f13d5f0d0f39935fc4010

URL: https://github.com/llvm/llvm-project/commit/bf45e27a676d87944f1f13d5f0d0f39935fc4010
DIFF: https://github.com/llvm/llvm-project/commit/bf45e27a676d87944f1f13d5f0d0f39935fc4010.diff

LOG: [Clang] Fix invalid utf-8 detection

The length of valid codepoints was incorrectly
calculated which was not caught before because the
absence of tests for the valid codepoints scenario.

Differential Revision: https://reviews.llvm.org/D129223

Added: 
    

Modified: 
    clang/test/Lexer/comment-invalid-utf8.c
    llvm/lib/Support/ConvertUTF.cpp

Removed: 
    


################################################################################
diff  --git a/clang/test/Lexer/comment-invalid-utf8.c b/clang/test/Lexer/comment-invalid-utf8.c
index b8bf551dd8564..ed7405a3c079e 100644
--- a/clang/test/Lexer/comment-invalid-utf8.c
+++ b/clang/test/Lexer/comment-invalid-utf8.c
@@ -25,3 +25,14 @@
 // abcd
 // €abcd
 // expected-warning at -1 {{invalid UTF-8 in comment}}
+
+
+//§ § § 😀 你好 ©
+
+/*§ § § 😀 你好 ©*/
+
+/*
+§ § § 😀 你好 ©
+*/
+
+/* § § § 😀 你好 © */

diff  --git a/llvm/lib/Support/ConvertUTF.cpp b/llvm/lib/Support/ConvertUTF.cpp
index c494110cdcee1..25875d4c3184b 100644
--- a/llvm/lib/Support/ConvertUTF.cpp
+++ b/llvm/lib/Support/ConvertUTF.cpp
@@ -423,7 +423,7 @@ Boolean isLegalUTF8Sequence(const UTF8 *source, const UTF8 *sourceEnd) {
  */
 unsigned getUTF8SequenceSize(const UTF8 *source, const UTF8 *sourceEnd) {
   int length = trailingBytesForUTF8[*source] + 1;
-  return (length > sourceEnd - source && isLegalUTF8(source, length)) ? length
+  return (length < sourceEnd - source && isLegalUTF8(source, length)) ? length
                                                                       : 0;
 }
 


        


More information about the llvm-commits mailing list