[llvm] bf45e27 - [Clang] Fix invalid utf-8 detection
Corentin Jabot via llvm-commits
llvm-commits at lists.llvm.org
Wed Jul 6 13:20:09 PDT 2022
Author: Corentin Jabot
Date: 2022-07-06T22:20:04+02:00
New Revision: bf45e27a676d87944f1f13d5f0d0f39935fc4010
URL: https://github.com/llvm/llvm-project/commit/bf45e27a676d87944f1f13d5f0d0f39935fc4010
DIFF: https://github.com/llvm/llvm-project/commit/bf45e27a676d87944f1f13d5f0d0f39935fc4010.diff
LOG: [Clang] Fix invalid utf-8 detection
The length of valid codepoints was incorrectly
calculated which was not caught before because the
absence of tests for the valid codepoints scenario.
Differential Revision: https://reviews.llvm.org/D129223
Added:
Modified:
clang/test/Lexer/comment-invalid-utf8.c
llvm/lib/Support/ConvertUTF.cpp
Removed:
################################################################################
diff --git a/clang/test/Lexer/comment-invalid-utf8.c b/clang/test/Lexer/comment-invalid-utf8.c
index b8bf551dd8564..ed7405a3c079e 100644
--- a/clang/test/Lexer/comment-invalid-utf8.c
+++ b/clang/test/Lexer/comment-invalid-utf8.c
@@ -25,3 +25,14 @@
// abcd
// €abcd
// expected-warning at -1 {{invalid UTF-8 in comment}}
+
+
+//§ § § 😀 ä½ å¥½ ©
+
+/*§ § § 😀 ä½ å¥½ ©*/
+
+/*
+§ § § 😀 ä½ å¥½ ©
+*/
+
+/* § § § 😀 ä½ å¥½ © */
diff --git a/llvm/lib/Support/ConvertUTF.cpp b/llvm/lib/Support/ConvertUTF.cpp
index c494110cdcee1..25875d4c3184b 100644
--- a/llvm/lib/Support/ConvertUTF.cpp
+++ b/llvm/lib/Support/ConvertUTF.cpp
@@ -423,7 +423,7 @@ Boolean isLegalUTF8Sequence(const UTF8 *source, const UTF8 *sourceEnd) {
*/
unsigned getUTF8SequenceSize(const UTF8 *source, const UTF8 *sourceEnd) {
int length = trailingBytesForUTF8[*source] + 1;
- return (length > sourceEnd - source && isLegalUTF8(source, length)) ? length
+ return (length < sourceEnd - source && isLegalUTF8(source, length)) ? length
: 0;
}
More information about the llvm-commits
mailing list