[PATCH] D128059: [Clang] Add a warning on invalid UTF-8 in comments.

Corentin Jabot via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jul 6 15:26:31 PDT 2022


cor3ntin updated this revision to Diff 442702.
cor3ntin added a comment.

Deploying that turned out to reveal a few critical issues

- `getUTF8SequenceSize` never reported a non-zero length for valid

UTF-8 sequences.

- In *some* circumstances (depending on the size of comment),

Unicode codepoints were parsed from one past their start,
because the CurPtr was sometimes, but not always, moved back.

I also added a test file with *valid* utf-8 in comments 
(which would have caught these issues).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D128059/new/

https://reviews.llvm.org/D128059

Files:
  clang/docs/ReleaseNotes.rst
  clang/include/clang/Basic/DiagnosticLexKinds.td
  clang/lib/Lex/Lexer.cpp
  clang/test/Lexer/comment-invalid-utf8.c
  clang/test/Lexer/comment-utf8.c
  clang/test/SemaCXX/static-assert.cpp
  llvm/include/llvm/Support/ConvertUTF.h
  llvm/lib/Support/ConvertUTF.cpp

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D128059.442702.patch
Type: text/x-patch
Size: 9706 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20220706/7734b64e/attachment.bin>


More information about the cfe-commits mailing list