[PATCH] D150843: [clang][Diagnostics] Refactor printableTextForNextCharacter
Tom Honermann via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu May 18 14:30:33 PDT 2023
tahonermann requested changes to this revision.
tahonermann added a comment.
This revision now requires changes to proceed.
Splitting this into two patches (one to do the renames, another to perform the other changes) would simplify review.
================
Comment at: clang/lib/Frontend/TextDiagnostic.cpp:118-119
- unsigned char const *begin, *end;
- begin = reinterpret_cast<unsigned char const *>(&*(SourceLine.begin() + *i));
- end = begin + (SourceLine.size() - *i);
-
- if (llvm::isLegalUTF8Sequence(begin, end)) {
- llvm::UTF32 c;
- llvm::UTF32 *cptr = &c;
- unsigned char const *original_begin = begin;
- unsigned char const *cp_end =
- begin + llvm::getNumBytesForUTF8(SourceLine[*i]);
-
- llvm::ConversionResult res = llvm::ConvertUTF8toUTF32(
- &begin, cp_end, &cptr, cptr + 1, llvm::strictConversion);
- (void)res;
- assert(llvm::conversionOK == res);
- assert(0 < begin-original_begin
- && "we must be further along in the string now");
- *i += begin-original_begin;
-
- if (!llvm::sys::locale::isPrint(c)) {
- // If next character is valid UTF-8, but not printable
- SmallString<16> expandedCP("<U+>");
- while (c) {
- expandedCP.insert(expandedCP.begin()+3, llvm::hexdigit(c%16));
- c/=16;
- }
- while (expandedCP.size() < 8)
- expandedCP.insert(expandedCP.begin()+3, llvm::hexdigit(0));
- return std::make_pair(expandedCP, false);
- }
+ const unsigned char *Begin =
+ reinterpret_cast<unsigned char const *>(&*(SourceLine.begin())) + (*I);
+ unsigned CharSize = llvm::getNumBytesForUTF8(*Begin);
----------------
================
Comment at: clang/lib/Frontend/TextDiagnostic.cpp:120
- begin = reinterpret_cast<unsigned char const *>(&*(SourceLine.begin() + *i));
- end = begin + (SourceLine.size() - *i);
-
----------------
tbaeder wrote:
> I don't know what this computation of `end` means, but from the debug output I added, it meant that a call to this function always converted the entire line. I think.
It looks to me like an overly complicated way of asking for `SourceLine.bytes_end()`.
================
Comment at: clang/lib/Frontend/TextDiagnostic.cpp:121
+ unsigned CharSize = llvm::getNumBytesForUTF8(*Begin);
+ const unsigned char *End = Begin + CharSize;
----------------
This could assign `End` to a location past the end of the source line when the source line ends with an ill-formed code unit sequence (e.g., a truncated 4-byte sequence). Constructing such a pointer is likely to get one in trouble.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D150843/new/
https://reviews.llvm.org/D150843
More information about the cfe-commits
mailing list