[clang] Skip escaped newlines before checking for whitespace in Lexer::getRawToken. (PR #117548)

Samira Bazuzi via cfe-commits cfe-commits at lists.llvm.org
Wed Nov 27 07:08:38 PST 2024


https://github.com/bazuzi updated https://github.com/llvm/llvm-project/pull/117548

>From 9c8b31dc266b770927785834c841b8ae5a7ebb58 Mon Sep 17 00:00:00 2001
From: Samira Bazuzi <bazuzi at google.com>
Date: Fri, 22 Nov 2024 15:45:55 -0500
Subject: [PATCH 1/4] Treat escaped newlines as whitespace in
 Lexer::getRawToken.

The Lexer used in getRawToken is not told to keep whitespace, so when it skips over escaped newlines, it also ignores whitespace, regardless of getRawToken's IgnoreWhiteSpace parameter. My suspicion is that users that want to not IgnoreWhiteSpace and therefore return true for a whitespace character would also safely accept true for an escaped newline. For users that do use IgnoreWhiteSpace, there is no behavior change, and the handling of escaped newlines is already correct.

If an escaped newline should not be considered whitespace, then instead of this change, getRawToken should be modified to return true when whitespace follows the escaped newline present at `Loc`, perhaps by using isWhitespace(SkipEscapedNewLines(StrData)[0]). However, this is incompatible with functions like clang::tidy::utils::lexer::getPreviousTokenAndStart. getPreviousTokenAndStart loops backwards through source location offsets, always decrementing by 1 without regard for potential character sizes larger than 1, such as escaped newlines. It seems more likely to me that there are more functions like this that would break than there are users who rely on escaped newlines not being treated as whitespace by getRawToken, but I'm open to that not being true.

The modified test was printing `\\nF` for the name of the expanded macro and now does not find a macro name. In my opinion, this is not an indication that the new behavior for getRawToken is incorrect. Rather, this is, both before and after this change, due to an incorrect storage of the backslash's source location as the spelling location of the expansion location of `F`.
---
 clang/lib/Lex/Lexer.cpp              | 4 +++-
 clang/test/Frontend/highlight-text.c | 3 +--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index e58c8bc72ae5b3..392cce6be0d171 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -527,7 +527,9 @@ bool Lexer::getRawToken(SourceLocation Loc, Token &Result,
 
   const char *StrData = Buffer.data()+LocInfo.second;
 
-  if (!IgnoreWhiteSpace && isWhitespace(StrData[0]))
+  if (!IgnoreWhiteSpace && (isWhitespace(StrData[0]) ||
+                            // Treat escaped newlines as whitespace.
+                            SkipEscapedNewLines(StrData) != StrData))
     return true;
 
   // Create a lexer starting at the beginning of this token.
diff --git a/clang/test/Frontend/highlight-text.c b/clang/test/Frontend/highlight-text.c
index a81d26caa4c24c..eefa4ebeec8ca4 100644
--- a/clang/test/Frontend/highlight-text.c
+++ b/clang/test/Frontend/highlight-text.c
@@ -12,8 +12,7 @@ int a = M;
 // CHECK-NEXT: :5:11: note: expanded from macro 'M'
 // CHECK-NEXT:     5 | #define M \
 // CHECK-NEXT:       |           ^
-// CHECK-NEXT: :3:14: note: expanded from macro '\
-// CHECK-NEXT: F'
+// CHECK-NEXT: :3:14: note: expanded from here
 // CHECK-NEXT:     3 | #define F (1 << 99)
 // CHECK-NEXT:       |              ^  ~~
 // CHECK-NEXT: :8:9: warning: shift count >= width of type [-Wshift-count-overflow]

>From 7dec0bb67491a20e8e010713640ce5f69503ec25 Mon Sep 17 00:00:00 2001
From: Samira Bazuzi <bazuzi at google.com>
Date: Tue, 26 Nov 2024 09:53:36 -0500
Subject: [PATCH 2/4] Switch to checking for whitespace after escaped newlines.

---
 clang/lib/Lex/Lexer.cpp              | 714 ++++++++++++++-------------
 clang/test/Frontend/highlight-text.c |   3 +-
 2 files changed, 382 insertions(+), 335 deletions(-)

diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index 392cce6be0d171..ea2c2aeebdcfd0 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -141,8 +141,8 @@ void Lexer::InitLexer(const char *BufStart, const char *BufPtr,
     // Determine the size of the BOM.
     StringRef Buf(BufferStart, BufferEnd - BufferStart);
     size_t BOMLength = llvm::StringSwitch<size_t>(Buf)
-      .StartsWith("\xEF\xBB\xBF", 3) // UTF-8 BOM
-      .Default(0);
+                           .StartsWith("\xEF\xBB\xBF", 3) // UTF-8 BOM
+                           .Default(0);
 
     // Skip the BOM.
     BufferPtr += BOMLength;
@@ -256,14 +256,14 @@ Lexer *Lexer::Create_PragmaLexer(SourceLocation SpellingLoc,
   const char *StrData = SM.getCharacterData(SpellingLoc);
 
   L->BufferPtr = StrData;
-  L->BufferEnd = StrData+TokLen;
+  L->BufferEnd = StrData + TokLen;
   assert(L->BufferEnd[0] == 0 && "Buffer is not nul terminated!");
 
   // Set the SourceLocation with the remapping information.  This ensures that
   // GetMappedTokenLoc will remap the tokens as they are lexed.
-  L->FileLoc = SM.createExpansionLoc(SM.getLocForStartOfFile(SpellingFID),
-                                     ExpansionLocStart,
-                                     ExpansionLocEnd, TokLen);
+  L->FileLoc =
+      SM.createExpansionLoc(SM.getLocForStartOfFile(SpellingFID),
+                            ExpansionLocStart, ExpansionLocEnd, TokLen);
 
   // Ensure that the lexer thinks it is inside a directive, so that end \n will
   // return an EOD token.
@@ -342,12 +342,14 @@ static size_t getSpellingSlow(const Token &Tok, const char *BufPtr,
     // Raw string literals need special handling; trigraph expansion and line
     // splicing do not occur within their d-char-sequence nor within their
     // r-char-sequence.
-    if (Length >= 2 &&
-        Spelling[Length - 2] == 'R' && Spelling[Length - 1] == '"') {
+    if (Length >= 2 && Spelling[Length - 2] == 'R' &&
+        Spelling[Length - 1] == '"') {
       // Search backwards from the end of the token to find the matching closing
       // quote.
       const char *RawEnd = BufEnd;
-      do --RawEnd; while (*RawEnd != '"');
+      do
+        --RawEnd;
+      while (*RawEnd != '"');
       size_t RawLength = RawEnd - BufPtr + 1;
 
       // Everything between the quotes is included verbatim in the spelling.
@@ -375,11 +377,9 @@ static size_t getSpellingSlow(const Token &Tok, const char *BufPtr,
 /// after trigraph expansion and escaped-newline folding.  In particular, this
 /// wants to get the true, uncanonicalized, spelling of things like digraphs
 /// UCNs, etc.
-StringRef Lexer::getSpelling(SourceLocation loc,
-                             SmallVectorImpl<char> &buffer,
+StringRef Lexer::getSpelling(SourceLocation loc, SmallVectorImpl<char> &buffer,
                              const SourceManager &SM,
-                             const LangOptions &options,
-                             bool *invalid) {
+                             const LangOptions &options, bool *invalid) {
   // Break down the source location.
   std::pair<FileID, unsigned> locInfo = SM.getDecomposedLoc(loc);
 
@@ -387,15 +387,16 @@ StringRef Lexer::getSpelling(SourceLocation loc,
   bool invalidTemp = false;
   StringRef file = SM.getBufferData(locInfo.first, &invalidTemp);
   if (invalidTemp) {
-    if (invalid) *invalid = true;
+    if (invalid)
+      *invalid = true;
     return {};
   }
 
   const char *tokenBegin = file.data() + locInfo.second;
 
   // Lex from the start of the given location.
-  Lexer lexer(SM.getLocForStartOfFile(locInfo.first), options,
-              file.begin(), tokenBegin, file.end());
+  Lexer lexer(SM.getLocForStartOfFile(locInfo.first), options, file.begin(),
+              tokenBegin, file.end());
   Token token;
   lexer.LexFromRawLexer(token);
 
@@ -421,8 +422,8 @@ std::string Lexer::getSpelling(const Token &Tok, const SourceManager &SourceMgr,
   assert((int)Tok.getLength() >= 0 && "Token character range is bogus!");
 
   bool CharDataInvalid = false;
-  const char *TokStart = SourceMgr.getCharacterData(Tok.getLocation(),
-                                                    &CharDataInvalid);
+  const char *TokStart =
+      SourceMgr.getCharacterData(Tok.getLocation(), &CharDataInvalid);
   if (Invalid)
     *Invalid = CharDataInvalid;
   if (CharDataInvalid)
@@ -488,15 +489,14 @@ unsigned Lexer::getSpelling(const Token &Tok, const char *&Buffer,
   }
 
   // Otherwise, hard case, relex the characters into the string.
-  return getSpellingSlow(Tok, TokStart, LangOpts, const_cast<char*>(Buffer));
+  return getSpellingSlow(Tok, TokStart, LangOpts, const_cast<char *>(Buffer));
 }
 
 /// MeasureTokenLength - Relex the token at the specified location and return
 /// its length in bytes in the input file.  If the token needs cleaning (e.g.
 /// includes a trigraph or an escaped newline) then this count includes bytes
 /// that are part of that.
-unsigned Lexer::MeasureTokenLength(SourceLocation Loc,
-                                   const SourceManager &SM,
+unsigned Lexer::MeasureTokenLength(SourceLocation Loc, const SourceManager &SM,
                                    const LangOptions &LangOpts) {
   Token TheTok;
   if (getRawToken(Loc, TheTok, SM, LangOpts))
@@ -507,8 +507,7 @@ unsigned Lexer::MeasureTokenLength(SourceLocation Loc,
 /// Relex the token at the specified location.
 /// \returns true if there was a failure, false on success.
 bool Lexer::getRawToken(SourceLocation Loc, Token &Result,
-                        const SourceManager &SM,
-                        const LangOptions &LangOpts,
+                        const SourceManager &SM, const LangOptions &LangOpts,
                         bool IgnoreWhiteSpace) {
   // TODO: this could be special cased for common tokens like identifiers, ')',
   // etc to make this faster, if it mattered.  Just look at StrData[0] to handle
@@ -525,11 +524,9 @@ bool Lexer::getRawToken(SourceLocation Loc, Token &Result,
   if (Invalid)
     return true;
 
-  const char *StrData = Buffer.data()+LocInfo.second;
+  const char *StrData = Buffer.data() + LocInfo.second;
 
-  if (!IgnoreWhiteSpace && (isWhitespace(StrData[0]) ||
-                            // Treat escaped newlines as whitespace.
-                            SkipEscapedNewLines(StrData) != StrData))
+  if (!IgnoreWhiteSpace && isWhitespace(SkipEscapedNewLines(StrData)[0]))
     return true;
 
   // Create a lexer starting at the beginning of this token.
@@ -628,10 +625,7 @@ SourceLocation Lexer::GetBeginningOfToken(SourceLocation Loc,
 
 namespace {
 
-enum PreambleDirectiveKind {
-  PDK_Skipped,
-  PDK_Unknown
-};
+enum PreambleDirectiveKind { PDK_Skipped, PDK_Unknown };
 
 } // namespace
 
@@ -715,31 +709,31 @@ PreambleBounds Lexer::ComputePreamble(StringRef Buffer,
       TheLexer.LexFromRawLexer(TheTok);
       if (TheTok.getKind() == tok::raw_identifier && !TheTok.needsCleaning()) {
         StringRef Keyword = TheTok.getRawIdentifier();
-        PreambleDirectiveKind PDK
-          = llvm::StringSwitch<PreambleDirectiveKind>(Keyword)
-              .Case("include", PDK_Skipped)
-              .Case("__include_macros", PDK_Skipped)
-              .Case("define", PDK_Skipped)
-              .Case("undef", PDK_Skipped)
-              .Case("line", PDK_Skipped)
-              .Case("error", PDK_Skipped)
-              .Case("pragma", PDK_Skipped)
-              .Case("import", PDK_Skipped)
-              .Case("include_next", PDK_Skipped)
-              .Case("warning", PDK_Skipped)
-              .Case("ident", PDK_Skipped)
-              .Case("sccs", PDK_Skipped)
-              .Case("assert", PDK_Skipped)
-              .Case("unassert", PDK_Skipped)
-              .Case("if", PDK_Skipped)
-              .Case("ifdef", PDK_Skipped)
-              .Case("ifndef", PDK_Skipped)
-              .Case("elif", PDK_Skipped)
-              .Case("elifdef", PDK_Skipped)
-              .Case("elifndef", PDK_Skipped)
-              .Case("else", PDK_Skipped)
-              .Case("endif", PDK_Skipped)
-              .Default(PDK_Unknown);
+        PreambleDirectiveKind PDK =
+            llvm::StringSwitch<PreambleDirectiveKind>(Keyword)
+                .Case("include", PDK_Skipped)
+                .Case("__include_macros", PDK_Skipped)
+                .Case("define", PDK_Skipped)
+                .Case("undef", PDK_Skipped)
+                .Case("line", PDK_Skipped)
+                .Case("error", PDK_Skipped)
+                .Case("pragma", PDK_Skipped)
+                .Case("import", PDK_Skipped)
+                .Case("include_next", PDK_Skipped)
+                .Case("warning", PDK_Skipped)
+                .Case("ident", PDK_Skipped)
+                .Case("sccs", PDK_Skipped)
+                .Case("assert", PDK_Skipped)
+                .Case("unassert", PDK_Skipped)
+                .Case("if", PDK_Skipped)
+                .Case("ifdef", PDK_Skipped)
+                .Case("ifndef", PDK_Skipped)
+                .Case("elif", PDK_Skipped)
+                .Case("elifdef", PDK_Skipped)
+                .Case("elifndef", PDK_Skipped)
+                .Case("else", PDK_Skipped)
+                .Case("endif", PDK_Skipped)
+                .Default(PDK_Unknown);
 
         switch (PDK) {
         case PDK_Skipped:
@@ -828,7 +822,7 @@ unsigned Lexer::getTokenPrefixLength(SourceLocation TokStart, unsigned CharNo,
   // advanced by 3 should return the location of b, not of \\.  One compounding
   // detail of this is that the escape may be made by a trigraph.
   if (!Lexer::isObviouslySimpleCharacter(*TokPtr))
-    PhysOffset += Lexer::SkipEscapedNewLines(TokPtr)-TokPtr;
+    PhysOffset += Lexer::SkipEscapedNewLines(TokPtr) - TokPtr;
 
   return PhysOffset;
 }
@@ -892,8 +886,7 @@ bool Lexer::isAtStartOfMacroExpansion(SourceLocation loc,
 
 /// Returns true if the given MacroID location points at the last
 /// token of the macro expansion.
-bool Lexer::isAtEndOfMacroExpansion(SourceLocation loc,
-                                    const SourceManager &SM,
+bool Lexer::isAtEndOfMacroExpansion(SourceLocation loc, const SourceManager &SM,
                                     const LangOptions &LangOpts,
                                     SourceLocation *MacroEnd) {
   assert(loc.isValid() && loc.isMacroID() && "Expected a valid macro loc");
@@ -925,7 +918,7 @@ static CharSourceRange makeRangeFromFileLocs(CharSourceRange Range,
   SourceLocation End = Range.getEnd();
   assert(Begin.isFileID() && End.isFileID());
   if (Range.isTokenRange()) {
-    End = Lexer::getLocForEndOfToken(End, 0, SM,LangOpts);
+    End = Lexer::getLocForEndOfToken(End, 0, SM, LangOpts);
     if (End.isInvalid())
       return {};
   }
@@ -938,8 +931,7 @@ static CharSourceRange makeRangeFromFileLocs(CharSourceRange Range,
     return {};
 
   unsigned EndOffs;
-  if (!SM.isInFileID(End, FID, &EndOffs) ||
-      BeginOffs > EndOffs)
+  if (!SM.isInFileID(End, FID, &EndOffs) || BeginOffs > EndOffs)
     return {};
 
   return CharSourceRange::getCharRange(Begin, End);
@@ -986,10 +978,10 @@ CharSourceRange Lexer::makeFileCharRange(CharSourceRange Range,
   assert(Begin.isMacroID() && End.isMacroID());
   SourceLocation MacroBegin, MacroEnd;
   if (isAtStartOfMacroExpansion(Begin, SM, LangOpts, &MacroBegin) &&
-      ((Range.isTokenRange() && isAtEndOfMacroExpansion(End, SM, LangOpts,
-                                                        &MacroEnd)) ||
-       (Range.isCharRange() && isAtStartOfMacroExpansion(End, SM, LangOpts,
-                                                         &MacroEnd)))) {
+      ((Range.isTokenRange() &&
+        isAtEndOfMacroExpansion(End, SM, LangOpts, &MacroEnd)) ||
+       (Range.isCharRange() &&
+        isAtStartOfMacroExpansion(End, SM, LangOpts, &MacroEnd)))) {
     Range.setBegin(MacroBegin);
     Range.setEnd(MacroEnd);
     // Use the *original* `End`, not the expanded one in `MacroEnd`.
@@ -999,14 +991,14 @@ CharSourceRange Lexer::makeFileCharRange(CharSourceRange Range,
   }
 
   bool Invalid = false;
-  const SrcMgr::SLocEntry &BeginEntry = SM.getSLocEntry(SM.getFileID(Begin),
-                                                        &Invalid);
+  const SrcMgr::SLocEntry &BeginEntry =
+      SM.getSLocEntry(SM.getFileID(Begin), &Invalid);
   if (Invalid)
     return {};
 
   if (BeginEntry.getExpansion().isMacroArgExpansion()) {
-    const SrcMgr::SLocEntry &EndEntry = SM.getSLocEntry(SM.getFileID(End),
-                                                        &Invalid);
+    const SrcMgr::SLocEntry &EndEntry =
+        SM.getSLocEntry(SM.getFileID(End), &Invalid);
     if (Invalid)
       return {};
 
@@ -1022,27 +1014,28 @@ CharSourceRange Lexer::makeFileCharRange(CharSourceRange Range,
   return {};
 }
 
-StringRef Lexer::getSourceText(CharSourceRange Range,
-                               const SourceManager &SM,
-                               const LangOptions &LangOpts,
-                               bool *Invalid) {
+StringRef Lexer::getSourceText(CharSourceRange Range, const SourceManager &SM,
+                               const LangOptions &LangOpts, bool *Invalid) {
   Range = makeFileCharRange(Range, SM, LangOpts);
   if (Range.isInvalid()) {
-    if (Invalid) *Invalid = true;
+    if (Invalid)
+      *Invalid = true;
     return {};
   }
 
   // Break down the source location.
   std::pair<FileID, unsigned> beginInfo = SM.getDecomposedLoc(Range.getBegin());
   if (beginInfo.first.isInvalid()) {
-    if (Invalid) *Invalid = true;
+    if (Invalid)
+      *Invalid = true;
     return {};
   }
 
   unsigned EndOffs;
   if (!SM.isInFileID(Range.getEnd(), beginInfo.first, &EndOffs) ||
       beginInfo.second > EndOffs) {
-    if (Invalid) *Invalid = true;
+    if (Invalid)
+      *Invalid = true;
     return {};
   }
 
@@ -1050,11 +1043,13 @@ StringRef Lexer::getSourceText(CharSourceRange Range,
   bool invalidTemp = false;
   StringRef file = SM.getBufferData(beginInfo.first, &invalidTemp);
   if (invalidTemp) {
-    if (Invalid) *Invalid = true;
+    if (Invalid)
+      *Invalid = true;
     return {};
   }
 
-  if (Invalid) *Invalid = false;
+  if (Invalid)
+    *Invalid = false;
   return file.substr(beginInfo.second, EndOffs - beginInfo.second);
 }
 
@@ -1188,8 +1183,8 @@ StringRef Lexer::getIndentationForLine(SourceLocation Loc,
 static LLVM_ATTRIBUTE_NOINLINE SourceLocation GetMappedTokenLoc(
     Preprocessor &PP, SourceLocation FileLoc, unsigned CharNo, unsigned TokLen);
 static SourceLocation GetMappedTokenLoc(Preprocessor &PP,
-                                        SourceLocation FileLoc,
-                                        unsigned CharNo, unsigned TokLen) {
+                                        SourceLocation FileLoc, unsigned CharNo,
+                                        unsigned TokLen) {
   assert(FileLoc.isMacroID() && "Must be a macro expansion");
 
   // Otherwise, we're lexing "mapped tokens".  This is used for things like
@@ -1218,7 +1213,7 @@ SourceLocation Lexer::getSourceLocation(const char *Loc,
 
   // In the normal case, we're just lexing from a simple file buffer, return
   // the file id from FileLoc with the offset specified.
-  unsigned CharNo = Loc-BufferStart;
+  unsigned CharNo = Loc - BufferStart;
   if (FileLoc.isFileID())
     return FileLoc.getLocWithOffset(CharNo);
 
@@ -1242,16 +1237,26 @@ DiagnosticBuilder Lexer::Diag(const char *Loc, unsigned DiagID) const {
 /// return the decoded trigraph letter it corresponds to, or '\0' if nothing.
 static char GetTrigraphCharForLetter(char Letter) {
   switch (Letter) {
-  default:   return 0;
-  case '=':  return '#';
-  case ')':  return ']';
-  case '(':  return '[';
-  case '!':  return '|';
-  case '\'': return '^';
-  case '>':  return '}';
-  case '/':  return '\\';
-  case '<':  return '{';
-  case '-':  return '~';
+  default:
+    return 0;
+  case '=':
+    return '#';
+  case ')':
+    return ']';
+  case '(':
+    return '[';
+  case '!':
+    return '|';
+  case '\'':
+    return '^';
+  case '>':
+    return '}';
+  case '/':
+    return '\\';
+  case '<':
+    return '{';
+  case '-':
+    return '~';
   }
 }
 
@@ -1266,12 +1271,12 @@ static char DecodeTrigraphChar(const char *CP, Lexer *L, bool Trigraphs) {
 
   if (!Trigraphs) {
     if (L && !L->isLexingRawMode())
-      L->Diag(CP-2, diag::trigraph_ignored);
+      L->Diag(CP - 2, diag::trigraph_ignored);
     return 0;
   }
 
   if (L && !L->isLexingRawMode())
-    L->Diag(CP-2, diag::trigraph_converted) << StringRef(&Res, 1);
+    L->Diag(CP - 2, diag::trigraph_converted) << StringRef(&Res, 1);
   return Res;
 }
 
@@ -1283,12 +1288,11 @@ unsigned Lexer::getEscapedNewLineSize(const char *Ptr) {
   while (isWhitespace(Ptr[Size])) {
     ++Size;
 
-    if (Ptr[Size-1] != '\n' && Ptr[Size-1] != '\r')
+    if (Ptr[Size - 1] != '\n' && Ptr[Size - 1] != '\r')
       continue;
 
     // If this is a \r\n or \n\r, skip the other half.
-    if ((Ptr[Size] == '\r' || Ptr[Size] == '\n') &&
-        Ptr[Size-1] != Ptr[Size])
+    if ((Ptr[Size] == '\r' || Ptr[Size] == '\n') && Ptr[Size - 1] != Ptr[Size])
       ++Size;
 
     return Size;
@@ -1305,21 +1309,22 @@ const char *Lexer::SkipEscapedNewLines(const char *P) {
   while (true) {
     const char *AfterEscape;
     if (*P == '\\') {
-      AfterEscape = P+1;
+      AfterEscape = P + 1;
     } else if (*P == '?') {
       // If not a trigraph for escape, bail out.
       if (P[1] != '?' || P[2] != '/')
         return P;
       // FIXME: Take LangOpts into account; the language might not
       // support trigraphs.
-      AfterEscape = P+3;
+      AfterEscape = P + 3;
     } else {
       return P;
     }
 
     unsigned NewLineSize = Lexer::getEscapedNewLineSize(AfterEscape);
-    if (NewLineSize == 0) return P;
-    P = AfterEscape+NewLineSize;
+    if (NewLineSize == 0)
+      return P;
+    P = AfterEscape + NewLineSize;
   }
 }
 
@@ -1345,7 +1350,7 @@ std::optional<Token> Lexer::findNextToken(SourceLocation Loc,
 
   // Lex from the start of the given location.
   Lexer lexer(SM.getLocForStartOfFile(LocInfo.first), LangOpts, File.begin(),
-                                      TokenBegin, File.end());
+              TokenBegin, File.end());
   // Find the token.
   Token Tok;
   lexer.LexFromRawLexer(Tok);
@@ -1408,7 +1413,7 @@ Lexer::SizedChar Lexer::getCharAndSizeSlow(const char *Ptr, Token *Tok) {
   if (Ptr[0] == '\\') {
     ++Size;
     ++Ptr;
-Slash:
+  Slash:
     // Common case, backslash-char where the char is not whitespace.
     if (!isWhitespace(Ptr[0]))
       return {'\\', Size};
@@ -1417,7 +1422,8 @@ Lexer::SizedChar Lexer::getCharAndSizeSlow(const char *Ptr, Token *Tok) {
     // newline.
     if (unsigned EscapedNewLineSize = getEscapedNewLineSize(Ptr)) {
       // Remember that this token needs to be cleaned.
-      if (Tok) Tok->setFlag(Token::NeedsCleaning);
+      if (Tok)
+        Tok->setFlag(Token::NeedsCleaning);
 
       // Warn if there was whitespace between the backslash and newline.
       if (Ptr[0] != '\n' && Ptr[0] != '\r' && Tok && !isLexingRawMode())
@@ -1425,7 +1431,7 @@ Lexer::SizedChar Lexer::getCharAndSizeSlow(const char *Ptr, Token *Tok) {
 
       // Found backslash<whitespace><newline>.  Parse the char after it.
       Size += EscapedNewLineSize;
-      Ptr  += EscapedNewLineSize;
+      Ptr += EscapedNewLineSize;
 
       // Use slow version to accumulate a correct size field.
       auto CharAndSize = getCharAndSizeSlow(Ptr, Tok);
@@ -1444,11 +1450,13 @@ Lexer::SizedChar Lexer::getCharAndSizeSlow(const char *Ptr, Token *Tok) {
     if (char C = DecodeTrigraphChar(Ptr + 2, Tok ? this : nullptr,
                                     LangOpts.Trigraphs)) {
       // Remember that this token needs to be cleaned.
-      if (Tok) Tok->setFlag(Token::NeedsCleaning);
+      if (Tok)
+        Tok->setFlag(Token::NeedsCleaning);
 
       Ptr += 3;
       Size += 3;
-      if (C == '\\') goto Slash;
+      if (C == '\\')
+        goto Slash;
       return {C, Size};
     }
   }
@@ -1471,7 +1479,7 @@ Lexer::SizedChar Lexer::getCharAndSizeSlowNoWarn(const char *Ptr,
   if (Ptr[0] == '\\') {
     ++Size;
     ++Ptr;
-Slash:
+  Slash:
     // Common case, backslash-char where the char is not whitespace.
     if (!isWhitespace(Ptr[0]))
       return {'\\', Size};
@@ -1480,7 +1488,7 @@ Lexer::SizedChar Lexer::getCharAndSizeSlowNoWarn(const char *Ptr,
     if (unsigned EscapedNewLineSize = getEscapedNewLineSize(Ptr)) {
       // Found backslash<whitespace><newline>.  Parse the char after it.
       Size += EscapedNewLineSize;
-      Ptr  += EscapedNewLineSize;
+      Ptr += EscapedNewLineSize;
 
       // Use slow version to accumulate a correct size field.
       auto CharAndSize = getCharAndSizeSlowNoWarn(Ptr, LangOpts);
@@ -1499,7 +1507,8 @@ Lexer::SizedChar Lexer::getCharAndSizeSlowNoWarn(const char *Ptr,
     if (char C = GetTrigraphCharForLetter(Ptr[2])) {
       Ptr += 3;
       Size += 3;
-      if (C == '\\') goto Slash;
+      if (C == '\\')
+        goto Slash;
       return {C, Size};
     }
   }
@@ -1637,10 +1646,7 @@ static void maybeDiagnoseIDCharCompat(DiagnosticsEngine &Diags, uint32_t C,
                                       CharSourceRange Range, bool IsFirst) {
   // Check C99 compatibility.
   if (!Diags.isIgnored(diag::warn_c99_compat_unicode_id, Range.getBegin())) {
-    enum {
-      CannotAppearInIdentifier = 0,
-      CannotStartIdentifier
-    };
+    enum { CannotAppearInIdentifier = 0, CannotStartIdentifier };
 
     static const llvm::sys::UnicodeCharSet C99AllowedIDChars(
         C99AllowedIDCharRanges);
@@ -1648,12 +1654,10 @@ static void maybeDiagnoseIDCharCompat(DiagnosticsEngine &Diags, uint32_t C,
         C99DisallowedInitialIDCharRanges);
     if (!C99AllowedIDChars.contains(C)) {
       Diags.Report(Range.getBegin(), diag::warn_c99_compat_unicode_id)
-        << Range
-        << CannotAppearInIdentifier;
+          << Range << CannotAppearInIdentifier;
     } else if (IsFirst && C99DisallowedInitialIDChars.contains(C)) {
       Diags.Report(Range.getBegin(), diag::warn_c99_compat_unicode_id)
-        << Range
-        << CannotStartIdentifier;
+          << Range << CannotStartIdentifier;
     }
   }
 }
@@ -1671,57 +1675,56 @@ static void maybeDiagnoseUTF8Homoglyph(DiagnosticsEngine &Diags, uint32_t C,
     bool operator<(HomoglyphPair R) const { return Character < R.Character; }
   };
   static constexpr HomoglyphPair SortedHomoglyphs[] = {
-    {U'\u00ad', 0},   // SOFT HYPHEN
-    {U'\u01c3', '!'}, // LATIN LETTER RETROFLEX CLICK
-    {U'\u037e', ';'}, // GREEK QUESTION MARK
-    {U'\u200b', 0},   // ZERO WIDTH SPACE
-    {U'\u200c', 0},   // ZERO WIDTH NON-JOINER
-    {U'\u200d', 0},   // ZERO WIDTH JOINER
-    {U'\u2060', 0},   // WORD JOINER
-    {U'\u2061', 0},   // FUNCTION APPLICATION
-    {U'\u2062', 0},   // INVISIBLE TIMES
-    {U'\u2063', 0},   // INVISIBLE SEPARATOR
-    {U'\u2064', 0},   // INVISIBLE PLUS
-    {U'\u2212', '-'}, // MINUS SIGN
-    {U'\u2215', '/'}, // DIVISION SLASH
-    {U'\u2216', '\\'}, // SET MINUS
-    {U'\u2217', '*'}, // ASTERISK OPERATOR
-    {U'\u2223', '|'}, // DIVIDES
-    {U'\u2227', '^'}, // LOGICAL AND
-    {U'\u2236', ':'}, // RATIO
-    {U'\u223c', '~'}, // TILDE OPERATOR
-    {U'\ua789', ':'}, // MODIFIER LETTER COLON
-    {U'\ufeff', 0},   // ZERO WIDTH NO-BREAK SPACE
-    {U'\uff01', '!'}, // FULLWIDTH EXCLAMATION MARK
-    {U'\uff03', '#'}, // FULLWIDTH NUMBER SIGN
-    {U'\uff04', '$'}, // FULLWIDTH DOLLAR SIGN
-    {U'\uff05', '%'}, // FULLWIDTH PERCENT SIGN
-    {U'\uff06', '&'}, // FULLWIDTH AMPERSAND
-    {U'\uff08', '('}, // FULLWIDTH LEFT PARENTHESIS
-    {U'\uff09', ')'}, // FULLWIDTH RIGHT PARENTHESIS
-    {U'\uff0a', '*'}, // FULLWIDTH ASTERISK
-    {U'\uff0b', '+'}, // FULLWIDTH ASTERISK
-    {U'\uff0c', ','}, // FULLWIDTH COMMA
-    {U'\uff0d', '-'}, // FULLWIDTH HYPHEN-MINUS
-    {U'\uff0e', '.'}, // FULLWIDTH FULL STOP
-    {U'\uff0f', '/'}, // FULLWIDTH SOLIDUS
-    {U'\uff1a', ':'}, // FULLWIDTH COLON
-    {U'\uff1b', ';'}, // FULLWIDTH SEMICOLON
-    {U'\uff1c', '<'}, // FULLWIDTH LESS-THAN SIGN
-    {U'\uff1d', '='}, // FULLWIDTH EQUALS SIGN
-    {U'\uff1e', '>'}, // FULLWIDTH GREATER-THAN SIGN
-    {U'\uff1f', '?'}, // FULLWIDTH QUESTION MARK
-    {U'\uff20', '@'}, // FULLWIDTH COMMERCIAL AT
-    {U'\uff3b', '['}, // FULLWIDTH LEFT SQUARE BRACKET
-    {U'\uff3c', '\\'}, // FULLWIDTH REVERSE SOLIDUS
-    {U'\uff3d', ']'}, // FULLWIDTH RIGHT SQUARE BRACKET
-    {U'\uff3e', '^'}, // FULLWIDTH CIRCUMFLEX ACCENT
-    {U'\uff5b', '{'}, // FULLWIDTH LEFT CURLY BRACKET
-    {U'\uff5c', '|'}, // FULLWIDTH VERTICAL LINE
-    {U'\uff5d', '}'}, // FULLWIDTH RIGHT CURLY BRACKET
-    {U'\uff5e', '~'}, // FULLWIDTH TILDE
-    {0, 0}
-  };
+      {U'\u00ad', 0},    // SOFT HYPHEN
+      {U'\u01c3', '!'},  // LATIN LETTER RETROFLEX CLICK
+      {U'\u037e', ';'},  // GREEK QUESTION MARK
+      {U'\u200b', 0},    // ZERO WIDTH SPACE
+      {U'\u200c', 0},    // ZERO WIDTH NON-JOINER
+      {U'\u200d', 0},    // ZERO WIDTH JOINER
+      {U'\u2060', 0},    // WORD JOINER
+      {U'\u2061', 0},    // FUNCTION APPLICATION
+      {U'\u2062', 0},    // INVISIBLE TIMES
+      {U'\u2063', 0},    // INVISIBLE SEPARATOR
+      {U'\u2064', 0},    // INVISIBLE PLUS
+      {U'\u2212', '-'},  // MINUS SIGN
+      {U'\u2215', '/'},  // DIVISION SLASH
+      {U'\u2216', '\\'}, // SET MINUS
+      {U'\u2217', '*'},  // ASTERISK OPERATOR
+      {U'\u2223', '|'},  // DIVIDES
+      {U'\u2227', '^'},  // LOGICAL AND
+      {U'\u2236', ':'},  // RATIO
+      {U'\u223c', '~'},  // TILDE OPERATOR
+      {U'\ua789', ':'},  // MODIFIER LETTER COLON
+      {U'\ufeff', 0},    // ZERO WIDTH NO-BREAK SPACE
+      {U'\uff01', '!'},  // FULLWIDTH EXCLAMATION MARK
+      {U'\uff03', '#'},  // FULLWIDTH NUMBER SIGN
+      {U'\uff04', '$'},  // FULLWIDTH DOLLAR SIGN
+      {U'\uff05', '%'},  // FULLWIDTH PERCENT SIGN
+      {U'\uff06', '&'},  // FULLWIDTH AMPERSAND
+      {U'\uff08', '('},  // FULLWIDTH LEFT PARENTHESIS
+      {U'\uff09', ')'},  // FULLWIDTH RIGHT PARENTHESIS
+      {U'\uff0a', '*'},  // FULLWIDTH ASTERISK
+      {U'\uff0b', '+'},  // FULLWIDTH ASTERISK
+      {U'\uff0c', ','},  // FULLWIDTH COMMA
+      {U'\uff0d', '-'},  // FULLWIDTH HYPHEN-MINUS
+      {U'\uff0e', '.'},  // FULLWIDTH FULL STOP
+      {U'\uff0f', '/'},  // FULLWIDTH SOLIDUS
+      {U'\uff1a', ':'},  // FULLWIDTH COLON
+      {U'\uff1b', ';'},  // FULLWIDTH SEMICOLON
+      {U'\uff1c', '<'},  // FULLWIDTH LESS-THAN SIGN
+      {U'\uff1d', '='},  // FULLWIDTH EQUALS SIGN
+      {U'\uff1e', '>'},  // FULLWIDTH GREATER-THAN SIGN
+      {U'\uff1f', '?'},  // FULLWIDTH QUESTION MARK
+      {U'\uff20', '@'},  // FULLWIDTH COMMERCIAL AT
+      {U'\uff3b', '['},  // FULLWIDTH LEFT SQUARE BRACKET
+      {U'\uff3c', '\\'}, // FULLWIDTH REVERSE SOLIDUS
+      {U'\uff3d', ']'},  // FULLWIDTH RIGHT SQUARE BRACKET
+      {U'\uff3e', '^'},  // FULLWIDTH CIRCUMFLEX ACCENT
+      {U'\uff5b', '{'},  // FULLWIDTH LEFT CURLY BRACKET
+      {U'\uff5c', '|'},  // FULLWIDTH VERTICAL LINE
+      {U'\uff5d', '}'},  // FULLWIDTH RIGHT CURLY BRACKET
+      {U'\uff5e', '~'},  // FULLWIDTH TILDE
+      {0, 0}};
   auto Homoglyph =
       std::lower_bound(std::begin(SortedHomoglyphs),
                        std::end(SortedHomoglyphs) - 1, HomoglyphPair{C, '\0'});
@@ -1796,7 +1799,7 @@ bool Lexer::tryConsumeIdentifierUCN(const char *&CurPtr, unsigned Size,
   }
 
   Result.setFlag(Token::HasUCN);
-  if ((UCNPtr - CurPtr ==  6 && CurPtr[1] == 'u') ||
+  if ((UCNPtr - CurPtr == 6 && CurPtr[1] == 'u') ||
       (UCNPtr - CurPtr == 10 && CurPtr[1] == 'U'))
     CurPtr = UCNPtr;
   else
@@ -2117,10 +2120,10 @@ const char *Lexer::LexUDSuffix(Token &Result, const char *CurPtr,
 
   if (!LangOpts.CPlusPlus11) {
     if (!isLexingRawMode())
-      Diag(CurPtr,
-           C == '_' ? diag::warn_cxx11_compat_user_defined_literal
-                    : diag::warn_cxx11_compat_reserved_user_defined_literal)
-        << FixItHint::CreateInsertion(getSourceLocation(CurPtr), " ");
+      Diag(CurPtr, C == '_'
+                       ? diag::warn_cxx11_compat_user_defined_literal
+                       : diag::warn_cxx11_compat_reserved_user_defined_literal)
+          << FixItHint::CreateInsertion(getSourceLocation(CurPtr), " ");
     return CurPtr;
   }
 
@@ -2138,7 +2141,7 @@ const char *Lexer::LexUDSuffix(Token &Result, const char *CurPtr,
       // valid suffix for a string literal or a numeric literal (this could be
       // the 'operator""if' defining a numeric literal operator).
       const unsigned MaxStandardSuffixLength = 3;
-      char Buffer[MaxStandardSuffixLength] = { C };
+      char Buffer[MaxStandardSuffixLength] = {C};
       unsigned Consumed = Size;
       unsigned Chars = 1;
       while (true) {
@@ -2196,8 +2199,7 @@ bool Lexer::LexStringLiteral(Token &Result, const char *CurPtr,
   const char *NulCharacter = nullptr;
 
   if (!isLexingRawMode() &&
-      (Kind == tok::utf8_string_literal ||
-       Kind == tok::utf16_string_literal ||
+      (Kind == tok::utf8_string_literal || Kind == tok::utf16_string_literal ||
        Kind == tok::utf32_string_literal))
     Diag(BufferPtr, LangOpts.CPlusPlus ? diag::warn_cxx98_compat_unicode_literal
                                        : diag::warn_c99_compat_unicode_literal);
@@ -2209,16 +2211,16 @@ bool Lexer::LexStringLiteral(Token &Result, const char *CurPtr,
     if (C == '\\')
       C = getAndAdvanceChar(CurPtr, Result);
 
-    if (C == '\n' || C == '\r' ||             // Newline.
-        (C == 0 && CurPtr-1 == BufferEnd)) {  // End of file.
+    if (C == '\n' || C == '\r' ||              // Newline.
+        (C == 0 && CurPtr - 1 == BufferEnd)) { // End of file.
       if (!isLexingRawMode() && !LangOpts.AsmPreprocessor)
         Diag(BufferPtr, diag::ext_unterminated_char_or_string) << 1;
-      FormTokenWithChars(Result, CurPtr-1, tok::unknown);
+      FormTokenWithChars(Result, CurPtr - 1, tok::unknown);
       return true;
     }
 
     if (C == 0) {
-      if (isCodeCompletionPoint(CurPtr-1)) {
+      if (isCodeCompletionPoint(CurPtr - 1)) {
         if (ParsingFilename)
           codeCompleteIncludedFile(AfterQuote, CurPtr - 1, /*IsAngled=*/false);
         else
@@ -2228,7 +2230,7 @@ bool Lexer::LexStringLiteral(Token &Result, const char *CurPtr,
         return true;
       }
 
-      NulCharacter = CurPtr-1;
+      NulCharacter = CurPtr - 1;
     }
     C = getAndAdvanceChar(CurPtr, Result);
   }
@@ -2284,7 +2286,7 @@ bool Lexer::LexRawStringLiteral(Token &Result, const char *CurPtr,
         Diag(PrefixEnd, diag::err_invalid_newline_raw_delim);
       } else {
         Diag(PrefixEnd, diag::err_invalid_char_raw_delim)
-          << StringRef(PrefixEnd, 1);
+            << StringRef(PrefixEnd, 1);
       }
     }
 
@@ -2296,7 +2298,7 @@ bool Lexer::LexRawStringLiteral(Token &Result, const char *CurPtr,
 
       if (C == '"')
         break;
-      if (C == 0 && CurPtr-1 == BufferEnd) {
+      if (C == 0 && CurPtr - 1 == BufferEnd) {
         --CurPtr;
         break;
       }
@@ -2319,11 +2321,11 @@ bool Lexer::LexRawStringLiteral(Token &Result, const char *CurPtr,
         CurPtr += PrefixLen + 1; // skip over prefix and '"'
         break;
       }
-    } else if (C == 0 && CurPtr-1 == BufferEnd) { // End of file.
+    } else if (C == 0 && CurPtr - 1 == BufferEnd) { // End of file.
       if (!isLexingRawMode())
         Diag(BufferPtr, diag::err_unterminated_raw_string)
-          << StringRef(Prefix, PrefixLen);
-      FormTokenWithChars(Result, CurPtr-1, tok::unknown);
+            << StringRef(Prefix, PrefixLen);
+      FormTokenWithChars(Result, CurPtr - 1, tok::unknown);
       return true;
     }
   }
@@ -2367,7 +2369,7 @@ bool Lexer::LexAngledStringLiteral(Token &Result, const char *CurPtr) {
         FormTokenWithChars(Result, CurPtr - 1, tok::unknown);
         return true;
       }
-      NulCharacter = CurPtr-1;
+      NulCharacter = CurPtr - 1;
     }
     C = getAndAdvanceChar(CurPtr, Result);
   }
@@ -2447,23 +2449,23 @@ bool Lexer::LexCharConstant(Token &Result, const char *CurPtr,
     if (C == '\\')
       C = getAndAdvanceChar(CurPtr, Result);
 
-    if (C == '\n' || C == '\r' ||             // Newline.
-        (C == 0 && CurPtr-1 == BufferEnd)) {  // End of file.
+    if (C == '\n' || C == '\r' ||              // Newline.
+        (C == 0 && CurPtr - 1 == BufferEnd)) { // End of file.
       if (!isLexingRawMode() && !LangOpts.AsmPreprocessor)
         Diag(BufferPtr, diag::ext_unterminated_char_or_string) << 0;
-      FormTokenWithChars(Result, CurPtr-1, tok::unknown);
+      FormTokenWithChars(Result, CurPtr - 1, tok::unknown);
       return true;
     }
 
     if (C == 0) {
-      if (isCodeCompletionPoint(CurPtr-1)) {
+      if (isCodeCompletionPoint(CurPtr - 1)) {
         PP->CodeCompleteNaturalLanguage();
-        FormTokenWithChars(Result, CurPtr-1, tok::unknown);
+        FormTokenWithChars(Result, CurPtr - 1, tok::unknown);
         cutOffLexing();
         return true;
       }
 
-      NulCharacter = CurPtr-1;
+      NulCharacter = CurPtr - 1;
     }
     C = getAndAdvanceChar(CurPtr, Result);
   }
@@ -2617,7 +2619,7 @@ bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
     const char *NextLine = CurPtr;
     if (C != 0) {
       // We found a newline, see if it's escaped.
-      const char *EscapePtr = CurPtr-1;
+      const char *EscapePtr = CurPtr - 1;
       bool HasSpace = false;
       while (isHorizontalWhitespace(*EscapePtr)) { // Skip whitespace.
         --EscapePtr;
@@ -2630,7 +2632,7 @@ bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
       else if (EscapePtr[0] == '/' && EscapePtr[-1] == '?' &&
                EscapePtr[-2] == '?' && LangOpts.Trigraphs)
         // Trigraph-escaped newline.
-        CurPtr = EscapePtr-2;
+        CurPtr = EscapePtr - 2;
       else
         break; // This is a newline, we're done.
 
@@ -2651,7 +2653,7 @@ bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
 
     // If we only read only one character, then no special handling is needed.
     // We're done and can skip forward to the newline.
-    if (C != 0 && CurPtr == OldPtr+1) {
+    if (C != 0 && CurPtr == OldPtr + 1) {
       CurPtr = NextLine;
       break;
     }
@@ -2667,14 +2669,14 @@ bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
           // line is also a // comment, but has spaces, don't emit a diagnostic.
           if (isWhitespace(C)) {
             const char *ForwardPtr = CurPtr;
-            while (isWhitespace(*ForwardPtr))  // Skip whitespace.
+            while (isWhitespace(*ForwardPtr)) // Skip whitespace.
               ++ForwardPtr;
             if (ForwardPtr[0] == '/' && ForwardPtr[1] == '/')
               break;
           }
 
           if (!isLexingRawMode())
-            Diag(OldPtr-1, diag::ext_multi_line_line_comment);
+            Diag(OldPtr - 1, diag::ext_multi_line_line_comment);
           break;
         }
     }
@@ -2684,7 +2686,7 @@ bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
       break;
     }
 
-    if (C == '\0' && isCodeCompletionPoint(CurPtr-1)) {
+    if (C == '\0' && isCodeCompletionPoint(CurPtr - 1)) {
       PP->CodeCompleteNaturalLanguage();
       cutOffLexing();
       return false;
@@ -2745,12 +2747,12 @@ bool Lexer::SaveLineComment(Token &Result, const char *CurPtr) {
     return true;
 
   assert(Spelling[0] == '/' && Spelling[1] == '/' && "Not line comment?");
-  Spelling[1] = '*';   // Change prefix to "/*".
-  Spelling += "*/";    // add suffix.
+  Spelling[1] = '*'; // Change prefix to "/*".
+  Spelling += "*/";  // add suffix.
 
   Result.setKind(tok::comment);
-  PP->CreateString(Spelling, Result,
-                   Result.getLocation(), Result.getLocation());
+  PP->CreateString(Spelling, Result, Result.getLocation(),
+                   Result.getLocation());
   return true;
 }
 
@@ -2858,7 +2860,7 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
   unsigned CharSize;
   unsigned char C = getCharAndSize(CurPtr, CharSize);
   CurPtr += CharSize;
-  if (C == 0 && CurPtr == BufferEnd+1) {
+  if (C == 0 && CurPtr == BufferEnd + 1) {
     if (!isLexingRawMode())
       Diag(BufferPtr, diag::err_unterminated_block_comment);
     --CurPtr;
@@ -2898,7 +2900,8 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
           goto MultiByteUTF8;
         C = *CurPtr++;
       }
-      if (C == '/') goto FoundSlash;
+      if (C == '/')
+        goto FoundSlash;
 
 #ifdef __SSE2__
       __m128i Slashes = _mm_set1_epi8('/');
@@ -2908,8 +2911,8 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
           goto MultiByteUTF8;
         }
         // look for slashes
-        int cmp = _mm_movemask_epi8(_mm_cmpeq_epi8(*(const __m128i*)CurPtr,
-                                    Slashes));
+        int cmp = _mm_movemask_epi8(
+            _mm_cmpeq_epi8(*(const __m128i *)CurPtr, Slashes));
         if (cmp != 0) {
           // Adjust the pointer to point directly after the first slash. It's
           // not necessary to set C here, it will be overwritten at the end of
@@ -2923,10 +2926,8 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
       __vector unsigned char LongUTF = {0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
                                         0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
                                         0x80, 0x80, 0x80, 0x80};
-      __vector unsigned char Slashes = {
-        '/', '/', '/', '/',  '/', '/', '/', '/',
-        '/', '/', '/', '/',  '/', '/', '/', '/'
-      };
+      __vector unsigned char Slashes = {'/', '/', '/', '/', '/', '/', '/', '/',
+                                        '/', '/', '/', '/', '/', '/', '/', '/'};
       while (CurPtr + 16 < BufferEnd) {
         if (LLVM_UNLIKELY(
                 vec_any_ge(*(const __vector unsigned char *)CurPtr, LongUTF)))
@@ -2985,8 +2986,8 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
     }
 
     if (C == '/') {
-  FoundSlash:
-      if (CurPtr[-2] == '*')  // We found the final */.  We're done!
+    FoundSlash:
+      if (CurPtr[-2] == '*') // We found the final */.  We're done!
         break;
 
       if ((CurPtr[-2] == '\n' || CurPtr[-2] == '\r')) {
@@ -3002,9 +3003,9 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
         // if this is a /*/, which will end the comment.  This misses cases with
         // embedded escaped newlines, but oh well.
         if (!isLexingRawMode())
-          Diag(CurPtr-1, diag::warn_nested_block_comment);
+          Diag(CurPtr - 1, diag::warn_nested_block_comment);
       }
-    } else if (C == 0 && CurPtr == BufferEnd+1) {
+    } else if (C == 0 && CurPtr == BufferEnd + 1) {
       if (!isLexingRawMode())
         Diag(BufferPtr, diag::err_unterminated_block_comment);
       // Note: the user probably forgot a */.  We could continue immediately
@@ -3021,7 +3022,7 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
 
       BufferPtr = CurPtr;
       return false;
-    } else if (C == '\0' && isCodeCompletionPoint(CurPtr-1)) {
+    } else if (C == '\0' && isCodeCompletionPoint(CurPtr - 1)) {
       PP->CodeCompleteNaturalLanguage();
       cutOffLexing();
       return false;
@@ -3049,7 +3050,7 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
   // efficiently now.  This is safe even in KeepWhitespaceMode because we would
   // have already returned above with the comment as a token.
   if (isHorizontalWhitespace(*CurPtr)) {
-    SkipWhitespace(Result, CurPtr+1, TokAtPhysicalStartOfLine);
+    SkipWhitespace(Result, CurPtr + 1, TokAtPhysicalStartOfLine);
     return false;
   }
 
@@ -3080,10 +3081,10 @@ void Lexer::ReadToEndOfLine(SmallVectorImpl<char> *Result) {
       if (Result)
         Result->push_back(Char);
       break;
-    case 0:  // Null.
+    case 0: // Null.
       // Found end of file?
-      if (CurPtr-1 != BufferEnd) {
-        if (isCodeCompletionPoint(CurPtr-1)) {
+      if (CurPtr - 1 != BufferEnd) {
+        if (isCodeCompletionPoint(CurPtr - 1)) {
           PP->CodeCompleteNaturalLanguage();
           cutOffLexing();
           return;
@@ -3100,7 +3101,7 @@ void Lexer::ReadToEndOfLine(SmallVectorImpl<char> *Result) {
     case '\n':
       // Okay, we found the end of the line. First, back up past the \0, \r, \n.
       assert(CurPtr[-1] == Char && "Trigraphs for newline?");
-      BufferPtr = CurPtr-1;
+      BufferPtr = CurPtr - 1;
 
       // Next, lex the character, which should handle the EOD transition.
       Lex(Tmp);
@@ -3134,7 +3135,7 @@ bool Lexer::LexEndOfFile(Token &Result, const char *CurPtr) {
     // Restore comment saving mode, in case it was disabled for directive.
     if (PP)
       resetExtendedTokenMode();
-    return true;  // Have a token.
+    return true; // Have a token.
   }
 
   // If we are in raw mode, return this event as an EOF token.  Let the caller
@@ -3186,8 +3187,7 @@ bool Lexer::LexEndOfFile(Token &Result, const char *CurPtr) {
       DiagID = diag::ext_no_newline_eof;
     }
 
-    Diag(BufferEnd, DiagID)
-      << FixItHint::CreateInsertion(EndLoc, "\n");
+    Diag(BufferEnd, DiagID) << FixItHint::CreateInsertion(EndLoc, "\n");
   }
 
   BufferPtr = CurPtr;
@@ -3251,11 +3251,11 @@ static const char *FindConflictEnd(const char *CurPtr, const char *BufferEnd,
     // Must occur at start of line.
     if (Pos == 0 ||
         (RestOfBuffer[Pos - 1] != '\r' && RestOfBuffer[Pos - 1] != '\n')) {
-      RestOfBuffer = RestOfBuffer.substr(Pos+TermLen);
+      RestOfBuffer = RestOfBuffer.substr(Pos + TermLen);
       Pos = RestOfBuffer.find(Terminator);
       continue;
     }
-    return RestOfBuffer.data()+Pos;
+    return RestOfBuffer.data() + Pos;
   }
   return nullptr;
 }
@@ -3266,8 +3266,7 @@ static const char *FindConflictEnd(const char *CurPtr, const char *BufferEnd,
 /// if not.
 bool Lexer::IsStartOfConflictMarker(const char *CurPtr) {
   // Only a conflict marker if it starts at the beginning of a line.
-  if (CurPtr != BufferStart &&
-      CurPtr[-1] != '\n' && CurPtr[-1] != '\r')
+  if (CurPtr != BufferStart && CurPtr[-1] != '\n' && CurPtr[-1] != '\r')
     return false;
 
   // Check to see if we have <<<<<<< or >>>>.
@@ -3310,8 +3309,7 @@ bool Lexer::IsStartOfConflictMarker(const char *CurPtr) {
 /// the line.  This returns true if it is a conflict marker and false if not.
 bool Lexer::HandleEndOfConflictMarker(const char *CurPtr) {
   // Only a conflict marker if it starts at the beginning of a line.
-  if (CurPtr != BufferStart &&
-      CurPtr[-1] != '\n' && CurPtr[-1] != '\r')
+  if (CurPtr != BufferStart && CurPtr[-1] != '\n' && CurPtr[-1] != '\r')
     return false;
 
   // If we have a situation where we don't care about conflict markers, ignore
@@ -3327,8 +3325,8 @@ bool Lexer::HandleEndOfConflictMarker(const char *CurPtr) {
   // If we do have it, search for the end of the conflict marker.  This could
   // fail if it got skipped with a '#if 0' or something.  Note that CurPtr might
   // be the end of conflict marker.
-  if (const char *End = FindConflictEnd(CurPtr, BufferEnd,
-                                        CurrentConflictMarkerState)) {
+  if (const char *End =
+          FindConflictEnd(CurPtr, BufferEnd, CurrentConflictMarkerState)) {
     CurPtr = End;
 
     // Skip ahead to the end of line.
@@ -3378,7 +3376,7 @@ bool Lexer::lexEditorPlaceholder(Token &Result, const char *CurPtr) {
 
 bool Lexer::isCodeCompletionPoint(const char *CurPtr) const {
   if (PP && PP->isCodeCompletionEnabled()) {
-    SourceLocation Loc = FileLoc.getLocWithOffset(CurPtr-BufferStart);
+    SourceLocation Loc = FileLoc.getLocWithOffset(CurPtr - BufferStart);
     return Loc == PP->getCodeCompletionLoc();
   }
 
@@ -3663,7 +3661,7 @@ bool Lexer::CheckUnicodeWhitespace(Token &Result, uint32_t C,
   if (!isLexingRawMode() && !PP->isPreprocessedOutput() &&
       isUnicodeWhitespace(C)) {
     Diag(BufferPtr, diag::ext_unicode_whitespace)
-      << makeCharRange(*this, BufferPtr, CurPtr);
+        << makeCharRange(*this, BufferPtr, CurPtr);
 
     Result.setFlag(Token::LeadingSpace);
     return true;
@@ -3703,7 +3701,7 @@ bool Lexer::Lex(Token &Result) {
   bool atPhysicalStartOfLine = IsAtPhysicalStartOfLine;
   IsAtPhysicalStartOfLine = false;
   bool isRawLex = isLexingRawMode();
-  (void) isRawLex;
+  (void)isRawLex;
   bool returnedToken = LexTokenInternal(Result, atPhysicalStartOfLine);
   // (After the LexTokenInternal call, the lexer might be destroyed.)
   assert((returnedToken || !isRawLex) && "Raw lex must succeed");
@@ -3742,7 +3740,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     Result.setFlag(Token::LeadingSpace);
   }
 
-  unsigned SizeTmp, SizeTmp2;   // Temporaries for use in cases below.
+  unsigned SizeTmp, SizeTmp2; // Temporaries for use in cases below.
 
   // Read a character, advancing over it.
   char Char = getAndAdvanceChar(CurPtr, Result);
@@ -3752,13 +3750,13 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     NewLinePtr = nullptr;
 
   switch (Char) {
-  case 0:  // Null.
+  case 0: // Null.
     // Found end of file?
-    if (CurPtr-1 == BufferEnd)
-      return LexEndOfFile(Result, CurPtr-1);
+    if (CurPtr - 1 == BufferEnd)
+      return LexEndOfFile(Result, CurPtr - 1);
 
     // Check if we are performing code completion.
-    if (isCodeCompletionPoint(CurPtr-1)) {
+    if (isCodeCompletionPoint(CurPtr - 1)) {
       // Return the code-completion token.
       Result.startToken();
       FormTokenWithChars(Result, CurPtr, tok::code_completion);
@@ -3766,7 +3764,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     }
 
     if (!isLexingRawMode())
-      Diag(CurPtr-1, diag::null_in_file);
+      Diag(CurPtr - 1, diag::null_in_file);
     Result.setFlag(Token::LeadingSpace);
     if (SkipWhitespace(Result, CurPtr, TokAtPhysicalStartOfLine))
       return true; // KeepWhitespaceMode
@@ -3775,12 +3773,12 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // (We manually eliminate the tail call to avoid recursion.)
     goto LexNextToken;
 
-  case 26:  // DOS & CP/M EOF: "^Z".
+  case 26: // DOS & CP/M EOF: "^Z".
     // If we're in Microsoft extensions mode, treat this as end of file.
     if (LangOpts.MicrosoftExt) {
       if (!isLexingRawMode())
-        Diag(CurPtr-1, diag::ext_ctrl_z_eof_microsoft);
-      return LexEndOfFile(Result, CurPtr-1);
+        Diag(CurPtr - 1, diag::ext_ctrl_z_eof_microsoft);
+      return LexEndOfFile(Result, CurPtr - 1);
     }
 
     // If Microsoft extensions are disabled, this is just random garbage.
@@ -3836,11 +3834,11 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // too (without going through the big switch stmt).
     if (CurPtr[0] == '/' && CurPtr[1] == '/' && !inKeepCommentMode() &&
         LineComment && (LangOpts.CPlusPlus || !LangOpts.TraditionalCPP)) {
-      if (SkipLineComment(Result, CurPtr+2, TokAtPhysicalStartOfLine))
+      if (SkipLineComment(Result, CurPtr + 2, TokAtPhysicalStartOfLine))
         return true; // There is a token to return.
       goto SkipIgnoredUnits;
     } else if (CurPtr[0] == '/' && CurPtr[1] == '*' && !inKeepCommentMode()) {
-      if (SkipBlockComment(Result, CurPtr+2, TokAtPhysicalStartOfLine))
+      if (SkipBlockComment(Result, CurPtr + 2, TokAtPhysicalStartOfLine))
         return true; // There is a token to return.
       goto SkipIgnoredUnits;
     } else if (isHorizontalWhitespace(*CurPtr)) {
@@ -3852,8 +3850,16 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
 
   // C99 6.4.4.1: Integer Constants.
   // C99 6.4.4.2: Floating Constants.
-  case '0': case '1': case '2': case '3': case '4':
-  case '5': case '6': case '7': case '8': case '9':
+  case '0':
+  case '1':
+  case '2':
+  case '3':
+  case '4':
+  case '5':
+  case '6':
+  case '7':
+  case '8':
+  case '9':
     // Notify MIOpt that we read a non-whitespace/non-comment token.
     MIOpt.ReadToken();
     return LexNumericConstant(Result, CurPtr);
@@ -3881,24 +3887,26 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       // UTF-16 raw string literal
       if (Char == 'R' && LangOpts.RawStringLiterals &&
           getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '"')
-        return LexRawStringLiteral(Result,
-                               ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                                           SizeTmp2, Result),
-                               tok::utf16_string_literal);
+        return LexRawStringLiteral(
+            Result,
+            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result),
+            tok::utf16_string_literal);
 
       if (Char == '8') {
         char Char2 = getCharAndSize(CurPtr + SizeTmp, SizeTmp2);
 
         // UTF-8 string literal
         if (Char2 == '"')
-          return LexStringLiteral(Result,
-                               ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                                           SizeTmp2, Result),
-                               tok::utf8_string_literal);
+          return LexStringLiteral(
+              Result,
+              ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2,
+                          Result),
+              tok::utf8_string_literal);
         if (Char2 == '\'' && (LangOpts.CPlusPlus17 || LangOpts.C23))
           return LexCharConstant(
-              Result, ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                                  SizeTmp2, Result),
+              Result,
+              ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2,
+                          Result),
               tok::utf8_char_constant);
 
         if (Char2 == 'R' && LangOpts.RawStringLiterals) {
@@ -3906,11 +3914,12 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
           char Char3 = getCharAndSize(CurPtr + SizeTmp + SizeTmp2, SizeTmp3);
           // UTF-8 raw string literal
           if (Char3 == '"') {
-            return LexRawStringLiteral(Result,
-                   ConsumeChar(ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                                           SizeTmp2, Result),
-                               SizeTmp3, Result),
-                   tok::utf8_string_literal);
+            return LexRawStringLiteral(
+                Result,
+                ConsumeChar(ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                                        SizeTmp2, Result),
+                            SizeTmp3, Result),
+                tok::utf8_string_literal);
           }
         }
       }
@@ -3939,10 +3948,10 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       // UTF-32 raw string literal
       if (Char == 'R' && LangOpts.RawStringLiterals &&
           getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '"')
-        return LexRawStringLiteral(Result,
-                               ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                                           SizeTmp2, Result),
-                               tok::utf32_string_literal);
+        return LexRawStringLiteral(
+            Result,
+            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result),
+            tok::utf32_string_literal);
     }
 
     // treat U like the start of an identifier.
@@ -3956,15 +3965,14 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       Char = getCharAndSize(CurPtr, SizeTmp);
 
       if (Char == '"')
-        return LexRawStringLiteral(Result,
-                                   ConsumeChar(CurPtr, SizeTmp, Result),
+        return LexRawStringLiteral(Result, ConsumeChar(CurPtr, SizeTmp, Result),
                                    tok::string_literal);
     }
 
     // treat R like the start of an identifier.
     return LexIdentifierContinue(Result, CurPtr);
 
-  case 'L':   // Identifier (Loony) or wide literal (L'x' or L"xyz").
+  case 'L': // Identifier (Loony) or wide literal (L'x' or L"xyz").
     // Notify MIOpt that we read a non-whitespace/non-comment token.
     MIOpt.ReadToken();
     Char = getCharAndSize(CurPtr, SizeTmp);
@@ -3977,10 +3985,10 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // Wide raw string literal.
     if (LangOpts.RawStringLiterals && Char == 'R' &&
         getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '"')
-      return LexRawStringLiteral(Result,
-                               ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                                           SizeTmp2, Result),
-                               tok::wide_string_literal);
+      return LexRawStringLiteral(
+          Result,
+          ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result),
+          tok::wide_string_literal);
 
     // Wide character constant.
     if (Char == '\'')
@@ -3990,23 +3998,63 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     [[fallthrough]];
 
   // C99 6.4.2: Identifiers.
-  case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G':
-  case 'H': case 'I': case 'J': case 'K':    /*'L'*/case 'M': case 'N':
-  case 'O': case 'P': case 'Q':    /*'R'*/case 'S': case 'T':    /*'U'*/
-  case 'V': case 'W': case 'X': case 'Y': case 'Z':
-  case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g':
-  case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n':
-  case 'o': case 'p': case 'q': case 'r': case 's': case 't':    /*'u'*/
-  case 'v': case 'w': case 'x': case 'y': case 'z':
+  case 'A':
+  case 'B':
+  case 'C':
+  case 'D':
+  case 'E':
+  case 'F':
+  case 'G':
+  case 'H':
+  case 'I':
+  case 'J':
+  case 'K': /*'L'*/
+  case 'M':
+  case 'N':
+  case 'O':
+  case 'P':
+  case 'Q': /*'R'*/
+  case 'S':
+  case 'T': /*'U'*/
+  case 'V':
+  case 'W':
+  case 'X':
+  case 'Y':
+  case 'Z':
+  case 'a':
+  case 'b':
+  case 'c':
+  case 'd':
+  case 'e':
+  case 'f':
+  case 'g':
+  case 'h':
+  case 'i':
+  case 'j':
+  case 'k':
+  case 'l':
+  case 'm':
+  case 'n':
+  case 'o':
+  case 'p':
+  case 'q':
+  case 'r':
+  case 's':
+  case 't': /*'u'*/
+  case 'v':
+  case 'w':
+  case 'x':
+  case 'y':
+  case 'z':
   case '_':
     // Notify MIOpt that we read a non-whitespace/non-comment token.
     MIOpt.ReadToken();
     return LexIdentifierContinue(Result, CurPtr);
 
-  case '$':   // $ in identifiers.
+  case '$': // $ in identifiers.
     if (LangOpts.DollarIdents) {
       if (!isLexingRawMode())
-        Diag(CurPtr-1, diag::ext_dollar_in_identifier);
+        Diag(CurPtr - 1, diag::ext_dollar_in_identifier);
       // Notify MIOpt that we read a non-whitespace/non-comment token.
       MIOpt.ReadToken();
       return LexIdentifierContinue(Result, CurPtr);
@@ -4062,10 +4110,10 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       Kind = tok::periodstar;
       CurPtr += SizeTmp;
     } else if (Char == '.' &&
-               getCharAndSize(CurPtr+SizeTmp, SizeTmp2) == '.') {
+               getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '.') {
       Kind = tok::ellipsis;
-      CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                           SizeTmp2, Result);
+      CurPtr =
+          ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
     } else {
       Kind = tok::period;
     }
@@ -4104,18 +4152,18 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     break;
   case '-':
     Char = getCharAndSize(CurPtr, SizeTmp);
-    if (Char == '-') {      // --
+    if (Char == '-') { // --
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::minusminus;
     } else if (Char == '>' && LangOpts.CPlusPlus &&
-               getCharAndSize(CurPtr+SizeTmp, SizeTmp2) == '*') {  // C++ ->*
-      CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                           SizeTmp2, Result);
+               getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '*') { // C++ ->*
+      CurPtr =
+          ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
       Kind = tok::arrowstar;
-    } else if (Char == '>') {   // ->
+    } else if (Char == '>') { // ->
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::arrow;
-    } else if (Char == '=') {   // -=
+    } else if (Char == '=') { // -=
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::minusequal;
     } else {
@@ -4136,7 +4184,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
   case '/':
     // 6.4.9: Comments
     Char = getCharAndSize(CurPtr, SizeTmp);
-    if (Char == '/') {         // Line comment.
+    if (Char == '/') { // Line comment.
       // Even if Line comments are disabled (e.g. in C89 mode), we generally
       // want to lex this as a comment.  There is one problem with this though,
       // that in one particular corner case, this can change the behavior of the
@@ -4149,7 +4197,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
           LineComment && (LangOpts.CPlusPlus || !LangOpts.TraditionalCPP);
       if (!TreatAsComment)
         if (!(PP && PP->isPreprocessedOutput()))
-          TreatAsComment = getCharAndSize(CurPtr+SizeTmp, SizeTmp2) != '*';
+          TreatAsComment = getCharAndSize(CurPtr + SizeTmp, SizeTmp2) != '*';
 
       if (TreatAsComment) {
         if (SkipLineComment(Result, ConsumeChar(CurPtr, SizeTmp, Result),
@@ -4163,7 +4211,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       }
     }
 
-    if (Char == '*') {  // /**/ comment.
+    if (Char == '*') { // /**/ comment.
       if (SkipBlockComment(Result, ConsumeChar(CurPtr, SizeTmp, Result),
                            TokAtPhysicalStartOfLine))
         return true; // There is a token to return.
@@ -4186,21 +4234,21 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       Kind = tok::percentequal;
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
     } else if (LangOpts.Digraphs && Char == '>') {
-      Kind = tok::r_brace;                             // '%>' -> '}'
+      Kind = tok::r_brace; // '%>' -> '}'
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
     } else if (LangOpts.Digraphs && Char == ':') {
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Char = getCharAndSize(CurPtr, SizeTmp);
-      if (Char == '%' && getCharAndSize(CurPtr+SizeTmp, SizeTmp2) == ':') {
-        Kind = tok::hashhash;                          // '%:%:' -> '##'
-        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                             SizeTmp2, Result);
-      } else if (Char == '@' && LangOpts.MicrosoftExt) {// %:@ -> #@ -> Charize
+      if (Char == '%' && getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == ':') {
+        Kind = tok::hashhash; // '%:%:' -> '##'
+        CurPtr =
+            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
+      } else if (Char == '@' && LangOpts.MicrosoftExt) { // %:@ -> #@ -> Charize
         CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
         if (!isLexingRawMode())
           Diag(BufferPtr, diag::ext_charize_microsoft);
         Kind = tok::hashat;
-      } else {                                         // '%:' -> '#'
+      } else { // '%:' -> '#'
         // We parsed a # character.  If this occurs at the start of the line,
         // it's actually the start of a preprocessing directive.  Callback to
         // the preprocessor to handle it.
@@ -4219,35 +4267,35 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     if (ParsingFilename) {
       return LexAngledStringLiteral(Result, CurPtr);
     } else if (Char == '<') {
-      char After = getCharAndSize(CurPtr+SizeTmp, SizeTmp2);
+      char After = getCharAndSize(CurPtr + SizeTmp, SizeTmp2);
       if (After == '=') {
         Kind = tok::lesslessequal;
-        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                             SizeTmp2, Result);
-      } else if (After == '<' && IsStartOfConflictMarker(CurPtr-1)) {
+        CurPtr =
+            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
+      } else if (After == '<' && IsStartOfConflictMarker(CurPtr - 1)) {
         // If this is actually a '<<<<<<<' version control conflict marker,
         // recognize it as such and recover nicely.
         goto LexNextToken;
-      } else if (After == '<' && HandleEndOfConflictMarker(CurPtr-1)) {
+      } else if (After == '<' && HandleEndOfConflictMarker(CurPtr - 1)) {
         // If this is '<<<<' and we're in a Perforce-style conflict marker,
         // ignore it.
         goto LexNextToken;
       } else if (LangOpts.CUDA && After == '<') {
         Kind = tok::lesslessless;
-        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                             SizeTmp2, Result);
+        CurPtr =
+            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
       } else {
         CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
         Kind = tok::lessless;
       }
     } else if (Char == '=') {
-      char After = getCharAndSize(CurPtr+SizeTmp, SizeTmp2);
+      char After = getCharAndSize(CurPtr + SizeTmp, SizeTmp2);
       if (After == '>') {
         if (LangOpts.CPlusPlus20) {
           if (!isLexingRawMode())
             Diag(BufferPtr, diag::warn_cxx17_compat_spaceship);
-          CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                               SizeTmp2, Result);
+          CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2,
+                               Result);
           Kind = tok::spaceship;
           break;
         }
@@ -4255,13 +4303,13 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
         // change in semantics if this turns up in C++ <=17 mode.
         if (LangOpts.CPlusPlus && !isLexingRawMode()) {
           Diag(BufferPtr, diag::warn_cxx20_compat_spaceship)
-            << FixItHint::CreateInsertion(
-                   getSourceLocation(CurPtr + SizeTmp, SizeTmp2), " ");
+              << FixItHint::CreateInsertion(
+                     getSourceLocation(CurPtr + SizeTmp, SizeTmp2), " ");
         }
       }
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::lessequal;
-    } else if (LangOpts.Digraphs && Char == ':') {     // '<:' -> '['
+    } else if (LangOpts.Digraphs && Char == ':') { // '<:' -> '['
       if (LangOpts.CPlusPlus11 &&
           getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == ':') {
         // C++0x [lex.pptoken]p3:
@@ -4281,7 +4329,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
 
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::l_square;
-    } else if (LangOpts.Digraphs && Char == '%') {     // '<%' -> '{'
+    } else if (LangOpts.Digraphs && Char == '%') { // '<%' -> '{'
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::l_brace;
     } else if (Char == '#' && /*Not a trigraph*/ SizeTmp == 1 &&
@@ -4297,22 +4345,22 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::greaterequal;
     } else if (Char == '>') {
-      char After = getCharAndSize(CurPtr+SizeTmp, SizeTmp2);
+      char After = getCharAndSize(CurPtr + SizeTmp, SizeTmp2);
       if (After == '=') {
-        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                             SizeTmp2, Result);
+        CurPtr =
+            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
         Kind = tok::greatergreaterequal;
-      } else if (After == '>' && IsStartOfConflictMarker(CurPtr-1)) {
+      } else if (After == '>' && IsStartOfConflictMarker(CurPtr - 1)) {
         // If this is actually a '>>>>' conflict marker, recognize it as such
         // and recover nicely.
         goto LexNextToken;
-      } else if (After == '>' && HandleEndOfConflictMarker(CurPtr-1)) {
+      } else if (After == '>' && HandleEndOfConflictMarker(CurPtr - 1)) {
         // If this is '>>>>>>>' and we're in a conflict marker, ignore it.
         goto LexNextToken;
       } else if (LangOpts.CUDA && After == '>') {
         Kind = tok::greatergreatergreater;
-        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                             SizeTmp2, Result);
+        CurPtr =
+            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
       } else {
         CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
         Kind = tok::greatergreater;
@@ -4339,7 +4387,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
     } else if (Char == '|') {
       // If this is '|||||||' and we're in a conflict marker, ignore it.
-      if (CurPtr[1] == '|' && HandleEndOfConflictMarker(CurPtr-1))
+      if (CurPtr[1] == '|' && HandleEndOfConflictMarker(CurPtr - 1))
         goto LexNextToken;
       Kind = tok::pipepipe;
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
@@ -4366,7 +4414,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     Char = getCharAndSize(CurPtr, SizeTmp);
     if (Char == '=') {
       // If this is '====' and we're in a conflict marker, ignore it.
-      if (CurPtr[1] == '=' && HandleEndOfConflictMarker(CurPtr-1))
+      if (CurPtr[1] == '=' && HandleEndOfConflictMarker(CurPtr - 1))
         goto LexNextToken;
 
       Kind = tok::equalequal;
@@ -4383,7 +4431,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     if (Char == '#') {
       Kind = tok::hashhash;
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
-    } else if (Char == '@' && LangOpts.MicrosoftExt) {  // #@ -> Charize
+    } else if (Char == '@' && LangOpts.MicrosoftExt) { // #@ -> Charize
       Kind = tok::hashat;
       if (!isLexingRawMode())
         Diag(BufferPtr, diag::ext_charize_microsoft);
@@ -4439,11 +4487,9 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // We can't just reset CurPtr to BufferPtr because BufferPtr may point to
     // an escaped newline.
     --CurPtr;
-    llvm::ConversionResult Status =
-        llvm::convertUTF8Sequence((const llvm::UTF8 **)&CurPtr,
-                                  (const llvm::UTF8 *)BufferEnd,
-                                  &CodePoint,
-                                  llvm::strictConversion);
+    llvm::ConversionResult Status = llvm::convertUTF8Sequence(
+        (const llvm::UTF8 **)&CurPtr, (const llvm::UTF8 *)BufferEnd, &CodePoint,
+        llvm::strictConversion);
     if (Status == llvm::conversionOK) {
       if (CheckUnicodeWhitespace(Result, CodePoint, CurPtr)) {
         if (SkipWhitespace(Result, CurPtr, TokAtPhysicalStartOfLine))
@@ -4468,7 +4514,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // just diagnose the invalid UTF-8, then drop the character.
     Diag(CurPtr, diag::err_invalid_utf8);
 
-    BufferPtr = CurPtr+1;
+    BufferPtr = CurPtr + 1;
     // We're pretending the character didn't exist, so just try again with
     // this lexer.
     // (We manually eliminate the tail call to avoid recursion.)
diff --git a/clang/test/Frontend/highlight-text.c b/clang/test/Frontend/highlight-text.c
index eefa4ebeec8ca4..a81d26caa4c24c 100644
--- a/clang/test/Frontend/highlight-text.c
+++ b/clang/test/Frontend/highlight-text.c
@@ -12,7 +12,8 @@ int a = M;
 // CHECK-NEXT: :5:11: note: expanded from macro 'M'
 // CHECK-NEXT:     5 | #define M \
 // CHECK-NEXT:       |           ^
-// CHECK-NEXT: :3:14: note: expanded from here
+// CHECK-NEXT: :3:14: note: expanded from macro '\
+// CHECK-NEXT: F'
 // CHECK-NEXT:     3 | #define F (1 << 99)
 // CHECK-NEXT:       |              ^  ~~
 // CHECK-NEXT: :8:9: warning: shift count >= width of type [-Wshift-count-overflow]

>From 7437706173cecc70f06377b118e3c8c5dc5989f6 Mon Sep 17 00:00:00 2001
From: Samira Bazuzi <bazuzi at google.com>
Date: Tue, 26 Nov 2024 10:00:38 -0500
Subject: [PATCH 3/4] Undo accidental formatting of entire file.

---
 clang/lib/Lex/Lexer.cpp | 710 +++++++++++++++++++---------------------
 1 file changed, 331 insertions(+), 379 deletions(-)

diff --git a/clang/lib/Lex/Lexer.cpp b/clang/lib/Lex/Lexer.cpp
index ea2c2aeebdcfd0..72364500a48f9f 100644
--- a/clang/lib/Lex/Lexer.cpp
+++ b/clang/lib/Lex/Lexer.cpp
@@ -141,8 +141,8 @@ void Lexer::InitLexer(const char *BufStart, const char *BufPtr,
     // Determine the size of the BOM.
     StringRef Buf(BufferStart, BufferEnd - BufferStart);
     size_t BOMLength = llvm::StringSwitch<size_t>(Buf)
-                           .StartsWith("\xEF\xBB\xBF", 3) // UTF-8 BOM
-                           .Default(0);
+      .StartsWith("\xEF\xBB\xBF", 3) // UTF-8 BOM
+      .Default(0);
 
     // Skip the BOM.
     BufferPtr += BOMLength;
@@ -256,14 +256,14 @@ Lexer *Lexer::Create_PragmaLexer(SourceLocation SpellingLoc,
   const char *StrData = SM.getCharacterData(SpellingLoc);
 
   L->BufferPtr = StrData;
-  L->BufferEnd = StrData + TokLen;
+  L->BufferEnd = StrData+TokLen;
   assert(L->BufferEnd[0] == 0 && "Buffer is not nul terminated!");
 
   // Set the SourceLocation with the remapping information.  This ensures that
   // GetMappedTokenLoc will remap the tokens as they are lexed.
-  L->FileLoc =
-      SM.createExpansionLoc(SM.getLocForStartOfFile(SpellingFID),
-                            ExpansionLocStart, ExpansionLocEnd, TokLen);
+  L->FileLoc = SM.createExpansionLoc(SM.getLocForStartOfFile(SpellingFID),
+                                     ExpansionLocStart,
+                                     ExpansionLocEnd, TokLen);
 
   // Ensure that the lexer thinks it is inside a directive, so that end \n will
   // return an EOD token.
@@ -342,14 +342,12 @@ static size_t getSpellingSlow(const Token &Tok, const char *BufPtr,
     // Raw string literals need special handling; trigraph expansion and line
     // splicing do not occur within their d-char-sequence nor within their
     // r-char-sequence.
-    if (Length >= 2 && Spelling[Length - 2] == 'R' &&
-        Spelling[Length - 1] == '"') {
+    if (Length >= 2 &&
+        Spelling[Length - 2] == 'R' && Spelling[Length - 1] == '"') {
       // Search backwards from the end of the token to find the matching closing
       // quote.
       const char *RawEnd = BufEnd;
-      do
-        --RawEnd;
-      while (*RawEnd != '"');
+      do --RawEnd; while (*RawEnd != '"');
       size_t RawLength = RawEnd - BufPtr + 1;
 
       // Everything between the quotes is included verbatim in the spelling.
@@ -377,9 +375,11 @@ static size_t getSpellingSlow(const Token &Tok, const char *BufPtr,
 /// after trigraph expansion and escaped-newline folding.  In particular, this
 /// wants to get the true, uncanonicalized, spelling of things like digraphs
 /// UCNs, etc.
-StringRef Lexer::getSpelling(SourceLocation loc, SmallVectorImpl<char> &buffer,
+StringRef Lexer::getSpelling(SourceLocation loc,
+                             SmallVectorImpl<char> &buffer,
                              const SourceManager &SM,
-                             const LangOptions &options, bool *invalid) {
+                             const LangOptions &options,
+                             bool *invalid) {
   // Break down the source location.
   std::pair<FileID, unsigned> locInfo = SM.getDecomposedLoc(loc);
 
@@ -387,16 +387,15 @@ StringRef Lexer::getSpelling(SourceLocation loc, SmallVectorImpl<char> &buffer,
   bool invalidTemp = false;
   StringRef file = SM.getBufferData(locInfo.first, &invalidTemp);
   if (invalidTemp) {
-    if (invalid)
-      *invalid = true;
+    if (invalid) *invalid = true;
     return {};
   }
 
   const char *tokenBegin = file.data() + locInfo.second;
 
   // Lex from the start of the given location.
-  Lexer lexer(SM.getLocForStartOfFile(locInfo.first), options, file.begin(),
-              tokenBegin, file.end());
+  Lexer lexer(SM.getLocForStartOfFile(locInfo.first), options,
+              file.begin(), tokenBegin, file.end());
   Token token;
   lexer.LexFromRawLexer(token);
 
@@ -422,8 +421,8 @@ std::string Lexer::getSpelling(const Token &Tok, const SourceManager &SourceMgr,
   assert((int)Tok.getLength() >= 0 && "Token character range is bogus!");
 
   bool CharDataInvalid = false;
-  const char *TokStart =
-      SourceMgr.getCharacterData(Tok.getLocation(), &CharDataInvalid);
+  const char *TokStart = SourceMgr.getCharacterData(Tok.getLocation(),
+                                                    &CharDataInvalid);
   if (Invalid)
     *Invalid = CharDataInvalid;
   if (CharDataInvalid)
@@ -489,14 +488,15 @@ unsigned Lexer::getSpelling(const Token &Tok, const char *&Buffer,
   }
 
   // Otherwise, hard case, relex the characters into the string.
-  return getSpellingSlow(Tok, TokStart, LangOpts, const_cast<char *>(Buffer));
+  return getSpellingSlow(Tok, TokStart, LangOpts, const_cast<char*>(Buffer));
 }
 
 /// MeasureTokenLength - Relex the token at the specified location and return
 /// its length in bytes in the input file.  If the token needs cleaning (e.g.
 /// includes a trigraph or an escaped newline) then this count includes bytes
 /// that are part of that.
-unsigned Lexer::MeasureTokenLength(SourceLocation Loc, const SourceManager &SM,
+unsigned Lexer::MeasureTokenLength(SourceLocation Loc,
+                                   const SourceManager &SM,
                                    const LangOptions &LangOpts) {
   Token TheTok;
   if (getRawToken(Loc, TheTok, SM, LangOpts))
@@ -507,7 +507,8 @@ unsigned Lexer::MeasureTokenLength(SourceLocation Loc, const SourceManager &SM,
 /// Relex the token at the specified location.
 /// \returns true if there was a failure, false on success.
 bool Lexer::getRawToken(SourceLocation Loc, Token &Result,
-                        const SourceManager &SM, const LangOptions &LangOpts,
+                        const SourceManager &SM,
+                        const LangOptions &LangOpts,
                         bool IgnoreWhiteSpace) {
   // TODO: this could be special cased for common tokens like identifiers, ')',
   // etc to make this faster, if it mattered.  Just look at StrData[0] to handle
@@ -524,7 +525,7 @@ bool Lexer::getRawToken(SourceLocation Loc, Token &Result,
   if (Invalid)
     return true;
 
-  const char *StrData = Buffer.data() + LocInfo.second;
+  const char *StrData = Buffer.data()+LocInfo.second;
 
   if (!IgnoreWhiteSpace && isWhitespace(SkipEscapedNewLines(StrData)[0]))
     return true;
@@ -625,7 +626,10 @@ SourceLocation Lexer::GetBeginningOfToken(SourceLocation Loc,
 
 namespace {
 
-enum PreambleDirectiveKind { PDK_Skipped, PDK_Unknown };
+enum PreambleDirectiveKind {
+  PDK_Skipped,
+  PDK_Unknown
+};
 
 } // namespace
 
@@ -709,31 +713,31 @@ PreambleBounds Lexer::ComputePreamble(StringRef Buffer,
       TheLexer.LexFromRawLexer(TheTok);
       if (TheTok.getKind() == tok::raw_identifier && !TheTok.needsCleaning()) {
         StringRef Keyword = TheTok.getRawIdentifier();
-        PreambleDirectiveKind PDK =
-            llvm::StringSwitch<PreambleDirectiveKind>(Keyword)
-                .Case("include", PDK_Skipped)
-                .Case("__include_macros", PDK_Skipped)
-                .Case("define", PDK_Skipped)
-                .Case("undef", PDK_Skipped)
-                .Case("line", PDK_Skipped)
-                .Case("error", PDK_Skipped)
-                .Case("pragma", PDK_Skipped)
-                .Case("import", PDK_Skipped)
-                .Case("include_next", PDK_Skipped)
-                .Case("warning", PDK_Skipped)
-                .Case("ident", PDK_Skipped)
-                .Case("sccs", PDK_Skipped)
-                .Case("assert", PDK_Skipped)
-                .Case("unassert", PDK_Skipped)
-                .Case("if", PDK_Skipped)
-                .Case("ifdef", PDK_Skipped)
-                .Case("ifndef", PDK_Skipped)
-                .Case("elif", PDK_Skipped)
-                .Case("elifdef", PDK_Skipped)
-                .Case("elifndef", PDK_Skipped)
-                .Case("else", PDK_Skipped)
-                .Case("endif", PDK_Skipped)
-                .Default(PDK_Unknown);
+        PreambleDirectiveKind PDK
+          = llvm::StringSwitch<PreambleDirectiveKind>(Keyword)
+              .Case("include", PDK_Skipped)
+              .Case("__include_macros", PDK_Skipped)
+              .Case("define", PDK_Skipped)
+              .Case("undef", PDK_Skipped)
+              .Case("line", PDK_Skipped)
+              .Case("error", PDK_Skipped)
+              .Case("pragma", PDK_Skipped)
+              .Case("import", PDK_Skipped)
+              .Case("include_next", PDK_Skipped)
+              .Case("warning", PDK_Skipped)
+              .Case("ident", PDK_Skipped)
+              .Case("sccs", PDK_Skipped)
+              .Case("assert", PDK_Skipped)
+              .Case("unassert", PDK_Skipped)
+              .Case("if", PDK_Skipped)
+              .Case("ifdef", PDK_Skipped)
+              .Case("ifndef", PDK_Skipped)
+              .Case("elif", PDK_Skipped)
+              .Case("elifdef", PDK_Skipped)
+              .Case("elifndef", PDK_Skipped)
+              .Case("else", PDK_Skipped)
+              .Case("endif", PDK_Skipped)
+              .Default(PDK_Unknown);
 
         switch (PDK) {
         case PDK_Skipped:
@@ -822,7 +826,7 @@ unsigned Lexer::getTokenPrefixLength(SourceLocation TokStart, unsigned CharNo,
   // advanced by 3 should return the location of b, not of \\.  One compounding
   // detail of this is that the escape may be made by a trigraph.
   if (!Lexer::isObviouslySimpleCharacter(*TokPtr))
-    PhysOffset += Lexer::SkipEscapedNewLines(TokPtr) - TokPtr;
+    PhysOffset += Lexer::SkipEscapedNewLines(TokPtr)-TokPtr;
 
   return PhysOffset;
 }
@@ -886,7 +890,8 @@ bool Lexer::isAtStartOfMacroExpansion(SourceLocation loc,
 
 /// Returns true if the given MacroID location points at the last
 /// token of the macro expansion.
-bool Lexer::isAtEndOfMacroExpansion(SourceLocation loc, const SourceManager &SM,
+bool Lexer::isAtEndOfMacroExpansion(SourceLocation loc,
+                                    const SourceManager &SM,
                                     const LangOptions &LangOpts,
                                     SourceLocation *MacroEnd) {
   assert(loc.isValid() && loc.isMacroID() && "Expected a valid macro loc");
@@ -918,7 +923,7 @@ static CharSourceRange makeRangeFromFileLocs(CharSourceRange Range,
   SourceLocation End = Range.getEnd();
   assert(Begin.isFileID() && End.isFileID());
   if (Range.isTokenRange()) {
-    End = Lexer::getLocForEndOfToken(End, 0, SM, LangOpts);
+    End = Lexer::getLocForEndOfToken(End, 0, SM,LangOpts);
     if (End.isInvalid())
       return {};
   }
@@ -931,7 +936,8 @@ static CharSourceRange makeRangeFromFileLocs(CharSourceRange Range,
     return {};
 
   unsigned EndOffs;
-  if (!SM.isInFileID(End, FID, &EndOffs) || BeginOffs > EndOffs)
+  if (!SM.isInFileID(End, FID, &EndOffs) ||
+      BeginOffs > EndOffs)
     return {};
 
   return CharSourceRange::getCharRange(Begin, End);
@@ -978,10 +984,10 @@ CharSourceRange Lexer::makeFileCharRange(CharSourceRange Range,
   assert(Begin.isMacroID() && End.isMacroID());
   SourceLocation MacroBegin, MacroEnd;
   if (isAtStartOfMacroExpansion(Begin, SM, LangOpts, &MacroBegin) &&
-      ((Range.isTokenRange() &&
-        isAtEndOfMacroExpansion(End, SM, LangOpts, &MacroEnd)) ||
-       (Range.isCharRange() &&
-        isAtStartOfMacroExpansion(End, SM, LangOpts, &MacroEnd)))) {
+      ((Range.isTokenRange() && isAtEndOfMacroExpansion(End, SM, LangOpts,
+                                                        &MacroEnd)) ||
+       (Range.isCharRange() && isAtStartOfMacroExpansion(End, SM, LangOpts,
+                                                         &MacroEnd)))) {
     Range.setBegin(MacroBegin);
     Range.setEnd(MacroEnd);
     // Use the *original* `End`, not the expanded one in `MacroEnd`.
@@ -991,14 +997,14 @@ CharSourceRange Lexer::makeFileCharRange(CharSourceRange Range,
   }
 
   bool Invalid = false;
-  const SrcMgr::SLocEntry &BeginEntry =
-      SM.getSLocEntry(SM.getFileID(Begin), &Invalid);
+  const SrcMgr::SLocEntry &BeginEntry = SM.getSLocEntry(SM.getFileID(Begin),
+                                                        &Invalid);
   if (Invalid)
     return {};
 
   if (BeginEntry.getExpansion().isMacroArgExpansion()) {
-    const SrcMgr::SLocEntry &EndEntry =
-        SM.getSLocEntry(SM.getFileID(End), &Invalid);
+    const SrcMgr::SLocEntry &EndEntry = SM.getSLocEntry(SM.getFileID(End),
+                                                        &Invalid);
     if (Invalid)
       return {};
 
@@ -1014,28 +1020,27 @@ CharSourceRange Lexer::makeFileCharRange(CharSourceRange Range,
   return {};
 }
 
-StringRef Lexer::getSourceText(CharSourceRange Range, const SourceManager &SM,
-                               const LangOptions &LangOpts, bool *Invalid) {
+StringRef Lexer::getSourceText(CharSourceRange Range,
+                               const SourceManager &SM,
+                               const LangOptions &LangOpts,
+                               bool *Invalid) {
   Range = makeFileCharRange(Range, SM, LangOpts);
   if (Range.isInvalid()) {
-    if (Invalid)
-      *Invalid = true;
+    if (Invalid) *Invalid = true;
     return {};
   }
 
   // Break down the source location.
   std::pair<FileID, unsigned> beginInfo = SM.getDecomposedLoc(Range.getBegin());
   if (beginInfo.first.isInvalid()) {
-    if (Invalid)
-      *Invalid = true;
+    if (Invalid) *Invalid = true;
     return {};
   }
 
   unsigned EndOffs;
   if (!SM.isInFileID(Range.getEnd(), beginInfo.first, &EndOffs) ||
       beginInfo.second > EndOffs) {
-    if (Invalid)
-      *Invalid = true;
+    if (Invalid) *Invalid = true;
     return {};
   }
 
@@ -1043,13 +1048,11 @@ StringRef Lexer::getSourceText(CharSourceRange Range, const SourceManager &SM,
   bool invalidTemp = false;
   StringRef file = SM.getBufferData(beginInfo.first, &invalidTemp);
   if (invalidTemp) {
-    if (Invalid)
-      *Invalid = true;
+    if (Invalid) *Invalid = true;
     return {};
   }
 
-  if (Invalid)
-    *Invalid = false;
+  if (Invalid) *Invalid = false;
   return file.substr(beginInfo.second, EndOffs - beginInfo.second);
 }
 
@@ -1183,8 +1186,8 @@ StringRef Lexer::getIndentationForLine(SourceLocation Loc,
 static LLVM_ATTRIBUTE_NOINLINE SourceLocation GetMappedTokenLoc(
     Preprocessor &PP, SourceLocation FileLoc, unsigned CharNo, unsigned TokLen);
 static SourceLocation GetMappedTokenLoc(Preprocessor &PP,
-                                        SourceLocation FileLoc, unsigned CharNo,
-                                        unsigned TokLen) {
+                                        SourceLocation FileLoc,
+                                        unsigned CharNo, unsigned TokLen) {
   assert(FileLoc.isMacroID() && "Must be a macro expansion");
 
   // Otherwise, we're lexing "mapped tokens".  This is used for things like
@@ -1213,7 +1216,7 @@ SourceLocation Lexer::getSourceLocation(const char *Loc,
 
   // In the normal case, we're just lexing from a simple file buffer, return
   // the file id from FileLoc with the offset specified.
-  unsigned CharNo = Loc - BufferStart;
+  unsigned CharNo = Loc-BufferStart;
   if (FileLoc.isFileID())
     return FileLoc.getLocWithOffset(CharNo);
 
@@ -1237,26 +1240,16 @@ DiagnosticBuilder Lexer::Diag(const char *Loc, unsigned DiagID) const {
 /// return the decoded trigraph letter it corresponds to, or '\0' if nothing.
 static char GetTrigraphCharForLetter(char Letter) {
   switch (Letter) {
-  default:
-    return 0;
-  case '=':
-    return '#';
-  case ')':
-    return ']';
-  case '(':
-    return '[';
-  case '!':
-    return '|';
-  case '\'':
-    return '^';
-  case '>':
-    return '}';
-  case '/':
-    return '\\';
-  case '<':
-    return '{';
-  case '-':
-    return '~';
+  default:   return 0;
+  case '=':  return '#';
+  case ')':  return ']';
+  case '(':  return '[';
+  case '!':  return '|';
+  case '\'': return '^';
+  case '>':  return '}';
+  case '/':  return '\\';
+  case '<':  return '{';
+  case '-':  return '~';
   }
 }
 
@@ -1271,12 +1264,12 @@ static char DecodeTrigraphChar(const char *CP, Lexer *L, bool Trigraphs) {
 
   if (!Trigraphs) {
     if (L && !L->isLexingRawMode())
-      L->Diag(CP - 2, diag::trigraph_ignored);
+      L->Diag(CP-2, diag::trigraph_ignored);
     return 0;
   }
 
   if (L && !L->isLexingRawMode())
-    L->Diag(CP - 2, diag::trigraph_converted) << StringRef(&Res, 1);
+    L->Diag(CP-2, diag::trigraph_converted) << StringRef(&Res, 1);
   return Res;
 }
 
@@ -1288,11 +1281,12 @@ unsigned Lexer::getEscapedNewLineSize(const char *Ptr) {
   while (isWhitespace(Ptr[Size])) {
     ++Size;
 
-    if (Ptr[Size - 1] != '\n' && Ptr[Size - 1] != '\r')
+    if (Ptr[Size-1] != '\n' && Ptr[Size-1] != '\r')
       continue;
 
     // If this is a \r\n or \n\r, skip the other half.
-    if ((Ptr[Size] == '\r' || Ptr[Size] == '\n') && Ptr[Size - 1] != Ptr[Size])
+    if ((Ptr[Size] == '\r' || Ptr[Size] == '\n') &&
+        Ptr[Size-1] != Ptr[Size])
       ++Size;
 
     return Size;
@@ -1309,22 +1303,21 @@ const char *Lexer::SkipEscapedNewLines(const char *P) {
   while (true) {
     const char *AfterEscape;
     if (*P == '\\') {
-      AfterEscape = P + 1;
+      AfterEscape = P+1;
     } else if (*P == '?') {
       // If not a trigraph for escape, bail out.
       if (P[1] != '?' || P[2] != '/')
         return P;
       // FIXME: Take LangOpts into account; the language might not
       // support trigraphs.
-      AfterEscape = P + 3;
+      AfterEscape = P+3;
     } else {
       return P;
     }
 
     unsigned NewLineSize = Lexer::getEscapedNewLineSize(AfterEscape);
-    if (NewLineSize == 0)
-      return P;
-    P = AfterEscape + NewLineSize;
+    if (NewLineSize == 0) return P;
+    P = AfterEscape+NewLineSize;
   }
 }
 
@@ -1350,7 +1343,7 @@ std::optional<Token> Lexer::findNextToken(SourceLocation Loc,
 
   // Lex from the start of the given location.
   Lexer lexer(SM.getLocForStartOfFile(LocInfo.first), LangOpts, File.begin(),
-              TokenBegin, File.end());
+                                      TokenBegin, File.end());
   // Find the token.
   Token Tok;
   lexer.LexFromRawLexer(Tok);
@@ -1413,7 +1406,7 @@ Lexer::SizedChar Lexer::getCharAndSizeSlow(const char *Ptr, Token *Tok) {
   if (Ptr[0] == '\\') {
     ++Size;
     ++Ptr;
-  Slash:
+Slash:
     // Common case, backslash-char where the char is not whitespace.
     if (!isWhitespace(Ptr[0]))
       return {'\\', Size};
@@ -1422,8 +1415,7 @@ Lexer::SizedChar Lexer::getCharAndSizeSlow(const char *Ptr, Token *Tok) {
     // newline.
     if (unsigned EscapedNewLineSize = getEscapedNewLineSize(Ptr)) {
       // Remember that this token needs to be cleaned.
-      if (Tok)
-        Tok->setFlag(Token::NeedsCleaning);
+      if (Tok) Tok->setFlag(Token::NeedsCleaning);
 
       // Warn if there was whitespace between the backslash and newline.
       if (Ptr[0] != '\n' && Ptr[0] != '\r' && Tok && !isLexingRawMode())
@@ -1431,7 +1423,7 @@ Lexer::SizedChar Lexer::getCharAndSizeSlow(const char *Ptr, Token *Tok) {
 
       // Found backslash<whitespace><newline>.  Parse the char after it.
       Size += EscapedNewLineSize;
-      Ptr += EscapedNewLineSize;
+      Ptr  += EscapedNewLineSize;
 
       // Use slow version to accumulate a correct size field.
       auto CharAndSize = getCharAndSizeSlow(Ptr, Tok);
@@ -1450,13 +1442,11 @@ Lexer::SizedChar Lexer::getCharAndSizeSlow(const char *Ptr, Token *Tok) {
     if (char C = DecodeTrigraphChar(Ptr + 2, Tok ? this : nullptr,
                                     LangOpts.Trigraphs)) {
       // Remember that this token needs to be cleaned.
-      if (Tok)
-        Tok->setFlag(Token::NeedsCleaning);
+      if (Tok) Tok->setFlag(Token::NeedsCleaning);
 
       Ptr += 3;
       Size += 3;
-      if (C == '\\')
-        goto Slash;
+      if (C == '\\') goto Slash;
       return {C, Size};
     }
   }
@@ -1479,7 +1469,7 @@ Lexer::SizedChar Lexer::getCharAndSizeSlowNoWarn(const char *Ptr,
   if (Ptr[0] == '\\') {
     ++Size;
     ++Ptr;
-  Slash:
+Slash:
     // Common case, backslash-char where the char is not whitespace.
     if (!isWhitespace(Ptr[0]))
       return {'\\', Size};
@@ -1488,7 +1478,7 @@ Lexer::SizedChar Lexer::getCharAndSizeSlowNoWarn(const char *Ptr,
     if (unsigned EscapedNewLineSize = getEscapedNewLineSize(Ptr)) {
       // Found backslash<whitespace><newline>.  Parse the char after it.
       Size += EscapedNewLineSize;
-      Ptr += EscapedNewLineSize;
+      Ptr  += EscapedNewLineSize;
 
       // Use slow version to accumulate a correct size field.
       auto CharAndSize = getCharAndSizeSlowNoWarn(Ptr, LangOpts);
@@ -1507,8 +1497,7 @@ Lexer::SizedChar Lexer::getCharAndSizeSlowNoWarn(const char *Ptr,
     if (char C = GetTrigraphCharForLetter(Ptr[2])) {
       Ptr += 3;
       Size += 3;
-      if (C == '\\')
-        goto Slash;
+      if (C == '\\') goto Slash;
       return {C, Size};
     }
   }
@@ -1646,7 +1635,10 @@ static void maybeDiagnoseIDCharCompat(DiagnosticsEngine &Diags, uint32_t C,
                                       CharSourceRange Range, bool IsFirst) {
   // Check C99 compatibility.
   if (!Diags.isIgnored(diag::warn_c99_compat_unicode_id, Range.getBegin())) {
-    enum { CannotAppearInIdentifier = 0, CannotStartIdentifier };
+    enum {
+      CannotAppearInIdentifier = 0,
+      CannotStartIdentifier
+    };
 
     static const llvm::sys::UnicodeCharSet C99AllowedIDChars(
         C99AllowedIDCharRanges);
@@ -1654,10 +1646,12 @@ static void maybeDiagnoseIDCharCompat(DiagnosticsEngine &Diags, uint32_t C,
         C99DisallowedInitialIDCharRanges);
     if (!C99AllowedIDChars.contains(C)) {
       Diags.Report(Range.getBegin(), diag::warn_c99_compat_unicode_id)
-          << Range << CannotAppearInIdentifier;
+        << Range
+        << CannotAppearInIdentifier;
     } else if (IsFirst && C99DisallowedInitialIDChars.contains(C)) {
       Diags.Report(Range.getBegin(), diag::warn_c99_compat_unicode_id)
-          << Range << CannotStartIdentifier;
+        << Range
+        << CannotStartIdentifier;
     }
   }
 }
@@ -1675,56 +1669,57 @@ static void maybeDiagnoseUTF8Homoglyph(DiagnosticsEngine &Diags, uint32_t C,
     bool operator<(HomoglyphPair R) const { return Character < R.Character; }
   };
   static constexpr HomoglyphPair SortedHomoglyphs[] = {
-      {U'\u00ad', 0},    // SOFT HYPHEN
-      {U'\u01c3', '!'},  // LATIN LETTER RETROFLEX CLICK
-      {U'\u037e', ';'},  // GREEK QUESTION MARK
-      {U'\u200b', 0},    // ZERO WIDTH SPACE
-      {U'\u200c', 0},    // ZERO WIDTH NON-JOINER
-      {U'\u200d', 0},    // ZERO WIDTH JOINER
-      {U'\u2060', 0},    // WORD JOINER
-      {U'\u2061', 0},    // FUNCTION APPLICATION
-      {U'\u2062', 0},    // INVISIBLE TIMES
-      {U'\u2063', 0},    // INVISIBLE SEPARATOR
-      {U'\u2064', 0},    // INVISIBLE PLUS
-      {U'\u2212', '-'},  // MINUS SIGN
-      {U'\u2215', '/'},  // DIVISION SLASH
-      {U'\u2216', '\\'}, // SET MINUS
-      {U'\u2217', '*'},  // ASTERISK OPERATOR
-      {U'\u2223', '|'},  // DIVIDES
-      {U'\u2227', '^'},  // LOGICAL AND
-      {U'\u2236', ':'},  // RATIO
-      {U'\u223c', '~'},  // TILDE OPERATOR
-      {U'\ua789', ':'},  // MODIFIER LETTER COLON
-      {U'\ufeff', 0},    // ZERO WIDTH NO-BREAK SPACE
-      {U'\uff01', '!'},  // FULLWIDTH EXCLAMATION MARK
-      {U'\uff03', '#'},  // FULLWIDTH NUMBER SIGN
-      {U'\uff04', '$'},  // FULLWIDTH DOLLAR SIGN
-      {U'\uff05', '%'},  // FULLWIDTH PERCENT SIGN
-      {U'\uff06', '&'},  // FULLWIDTH AMPERSAND
-      {U'\uff08', '('},  // FULLWIDTH LEFT PARENTHESIS
-      {U'\uff09', ')'},  // FULLWIDTH RIGHT PARENTHESIS
-      {U'\uff0a', '*'},  // FULLWIDTH ASTERISK
-      {U'\uff0b', '+'},  // FULLWIDTH ASTERISK
-      {U'\uff0c', ','},  // FULLWIDTH COMMA
-      {U'\uff0d', '-'},  // FULLWIDTH HYPHEN-MINUS
-      {U'\uff0e', '.'},  // FULLWIDTH FULL STOP
-      {U'\uff0f', '/'},  // FULLWIDTH SOLIDUS
-      {U'\uff1a', ':'},  // FULLWIDTH COLON
-      {U'\uff1b', ';'},  // FULLWIDTH SEMICOLON
-      {U'\uff1c', '<'},  // FULLWIDTH LESS-THAN SIGN
-      {U'\uff1d', '='},  // FULLWIDTH EQUALS SIGN
-      {U'\uff1e', '>'},  // FULLWIDTH GREATER-THAN SIGN
-      {U'\uff1f', '?'},  // FULLWIDTH QUESTION MARK
-      {U'\uff20', '@'},  // FULLWIDTH COMMERCIAL AT
-      {U'\uff3b', '['},  // FULLWIDTH LEFT SQUARE BRACKET
-      {U'\uff3c', '\\'}, // FULLWIDTH REVERSE SOLIDUS
-      {U'\uff3d', ']'},  // FULLWIDTH RIGHT SQUARE BRACKET
-      {U'\uff3e', '^'},  // FULLWIDTH CIRCUMFLEX ACCENT
-      {U'\uff5b', '{'},  // FULLWIDTH LEFT CURLY BRACKET
-      {U'\uff5c', '|'},  // FULLWIDTH VERTICAL LINE
-      {U'\uff5d', '}'},  // FULLWIDTH RIGHT CURLY BRACKET
-      {U'\uff5e', '~'},  // FULLWIDTH TILDE
-      {0, 0}};
+    {U'\u00ad', 0},   // SOFT HYPHEN
+    {U'\u01c3', '!'}, // LATIN LETTER RETROFLEX CLICK
+    {U'\u037e', ';'}, // GREEK QUESTION MARK
+    {U'\u200b', 0},   // ZERO WIDTH SPACE
+    {U'\u200c', 0},   // ZERO WIDTH NON-JOINER
+    {U'\u200d', 0},   // ZERO WIDTH JOINER
+    {U'\u2060', 0},   // WORD JOINER
+    {U'\u2061', 0},   // FUNCTION APPLICATION
+    {U'\u2062', 0},   // INVISIBLE TIMES
+    {U'\u2063', 0},   // INVISIBLE SEPARATOR
+    {U'\u2064', 0},   // INVISIBLE PLUS
+    {U'\u2212', '-'}, // MINUS SIGN
+    {U'\u2215', '/'}, // DIVISION SLASH
+    {U'\u2216', '\\'}, // SET MINUS
+    {U'\u2217', '*'}, // ASTERISK OPERATOR
+    {U'\u2223', '|'}, // DIVIDES
+    {U'\u2227', '^'}, // LOGICAL AND
+    {U'\u2236', ':'}, // RATIO
+    {U'\u223c', '~'}, // TILDE OPERATOR
+    {U'\ua789', ':'}, // MODIFIER LETTER COLON
+    {U'\ufeff', 0},   // ZERO WIDTH NO-BREAK SPACE
+    {U'\uff01', '!'}, // FULLWIDTH EXCLAMATION MARK
+    {U'\uff03', '#'}, // FULLWIDTH NUMBER SIGN
+    {U'\uff04', '$'}, // FULLWIDTH DOLLAR SIGN
+    {U'\uff05', '%'}, // FULLWIDTH PERCENT SIGN
+    {U'\uff06', '&'}, // FULLWIDTH AMPERSAND
+    {U'\uff08', '('}, // FULLWIDTH LEFT PARENTHESIS
+    {U'\uff09', ')'}, // FULLWIDTH RIGHT PARENTHESIS
+    {U'\uff0a', '*'}, // FULLWIDTH ASTERISK
+    {U'\uff0b', '+'}, // FULLWIDTH ASTERISK
+    {U'\uff0c', ','}, // FULLWIDTH COMMA
+    {U'\uff0d', '-'}, // FULLWIDTH HYPHEN-MINUS
+    {U'\uff0e', '.'}, // FULLWIDTH FULL STOP
+    {U'\uff0f', '/'}, // FULLWIDTH SOLIDUS
+    {U'\uff1a', ':'}, // FULLWIDTH COLON
+    {U'\uff1b', ';'}, // FULLWIDTH SEMICOLON
+    {U'\uff1c', '<'}, // FULLWIDTH LESS-THAN SIGN
+    {U'\uff1d', '='}, // FULLWIDTH EQUALS SIGN
+    {U'\uff1e', '>'}, // FULLWIDTH GREATER-THAN SIGN
+    {U'\uff1f', '?'}, // FULLWIDTH QUESTION MARK
+    {U'\uff20', '@'}, // FULLWIDTH COMMERCIAL AT
+    {U'\uff3b', '['}, // FULLWIDTH LEFT SQUARE BRACKET
+    {U'\uff3c', '\\'}, // FULLWIDTH REVERSE SOLIDUS
+    {U'\uff3d', ']'}, // FULLWIDTH RIGHT SQUARE BRACKET
+    {U'\uff3e', '^'}, // FULLWIDTH CIRCUMFLEX ACCENT
+    {U'\uff5b', '{'}, // FULLWIDTH LEFT CURLY BRACKET
+    {U'\uff5c', '|'}, // FULLWIDTH VERTICAL LINE
+    {U'\uff5d', '}'}, // FULLWIDTH RIGHT CURLY BRACKET
+    {U'\uff5e', '~'}, // FULLWIDTH TILDE
+    {0, 0}
+  };
   auto Homoglyph =
       std::lower_bound(std::begin(SortedHomoglyphs),
                        std::end(SortedHomoglyphs) - 1, HomoglyphPair{C, '\0'});
@@ -1799,7 +1794,7 @@ bool Lexer::tryConsumeIdentifierUCN(const char *&CurPtr, unsigned Size,
   }
 
   Result.setFlag(Token::HasUCN);
-  if ((UCNPtr - CurPtr == 6 && CurPtr[1] == 'u') ||
+  if ((UCNPtr - CurPtr ==  6 && CurPtr[1] == 'u') ||
       (UCNPtr - CurPtr == 10 && CurPtr[1] == 'U'))
     CurPtr = UCNPtr;
   else
@@ -2120,10 +2115,10 @@ const char *Lexer::LexUDSuffix(Token &Result, const char *CurPtr,
 
   if (!LangOpts.CPlusPlus11) {
     if (!isLexingRawMode())
-      Diag(CurPtr, C == '_'
-                       ? diag::warn_cxx11_compat_user_defined_literal
-                       : diag::warn_cxx11_compat_reserved_user_defined_literal)
-          << FixItHint::CreateInsertion(getSourceLocation(CurPtr), " ");
+      Diag(CurPtr,
+           C == '_' ? diag::warn_cxx11_compat_user_defined_literal
+                    : diag::warn_cxx11_compat_reserved_user_defined_literal)
+        << FixItHint::CreateInsertion(getSourceLocation(CurPtr), " ");
     return CurPtr;
   }
 
@@ -2141,7 +2136,7 @@ const char *Lexer::LexUDSuffix(Token &Result, const char *CurPtr,
       // valid suffix for a string literal or a numeric literal (this could be
       // the 'operator""if' defining a numeric literal operator).
       const unsigned MaxStandardSuffixLength = 3;
-      char Buffer[MaxStandardSuffixLength] = {C};
+      char Buffer[MaxStandardSuffixLength] = { C };
       unsigned Consumed = Size;
       unsigned Chars = 1;
       while (true) {
@@ -2199,7 +2194,8 @@ bool Lexer::LexStringLiteral(Token &Result, const char *CurPtr,
   const char *NulCharacter = nullptr;
 
   if (!isLexingRawMode() &&
-      (Kind == tok::utf8_string_literal || Kind == tok::utf16_string_literal ||
+      (Kind == tok::utf8_string_literal ||
+       Kind == tok::utf16_string_literal ||
        Kind == tok::utf32_string_literal))
     Diag(BufferPtr, LangOpts.CPlusPlus ? diag::warn_cxx98_compat_unicode_literal
                                        : diag::warn_c99_compat_unicode_literal);
@@ -2211,16 +2207,16 @@ bool Lexer::LexStringLiteral(Token &Result, const char *CurPtr,
     if (C == '\\')
       C = getAndAdvanceChar(CurPtr, Result);
 
-    if (C == '\n' || C == '\r' ||              // Newline.
-        (C == 0 && CurPtr - 1 == BufferEnd)) { // End of file.
+    if (C == '\n' || C == '\r' ||             // Newline.
+        (C == 0 && CurPtr-1 == BufferEnd)) {  // End of file.
       if (!isLexingRawMode() && !LangOpts.AsmPreprocessor)
         Diag(BufferPtr, diag::ext_unterminated_char_or_string) << 1;
-      FormTokenWithChars(Result, CurPtr - 1, tok::unknown);
+      FormTokenWithChars(Result, CurPtr-1, tok::unknown);
       return true;
     }
 
     if (C == 0) {
-      if (isCodeCompletionPoint(CurPtr - 1)) {
+      if (isCodeCompletionPoint(CurPtr-1)) {
         if (ParsingFilename)
           codeCompleteIncludedFile(AfterQuote, CurPtr - 1, /*IsAngled=*/false);
         else
@@ -2230,7 +2226,7 @@ bool Lexer::LexStringLiteral(Token &Result, const char *CurPtr,
         return true;
       }
 
-      NulCharacter = CurPtr - 1;
+      NulCharacter = CurPtr-1;
     }
     C = getAndAdvanceChar(CurPtr, Result);
   }
@@ -2286,7 +2282,7 @@ bool Lexer::LexRawStringLiteral(Token &Result, const char *CurPtr,
         Diag(PrefixEnd, diag::err_invalid_newline_raw_delim);
       } else {
         Diag(PrefixEnd, diag::err_invalid_char_raw_delim)
-            << StringRef(PrefixEnd, 1);
+          << StringRef(PrefixEnd, 1);
       }
     }
 
@@ -2298,7 +2294,7 @@ bool Lexer::LexRawStringLiteral(Token &Result, const char *CurPtr,
 
       if (C == '"')
         break;
-      if (C == 0 && CurPtr - 1 == BufferEnd) {
+      if (C == 0 && CurPtr-1 == BufferEnd) {
         --CurPtr;
         break;
       }
@@ -2321,11 +2317,11 @@ bool Lexer::LexRawStringLiteral(Token &Result, const char *CurPtr,
         CurPtr += PrefixLen + 1; // skip over prefix and '"'
         break;
       }
-    } else if (C == 0 && CurPtr - 1 == BufferEnd) { // End of file.
+    } else if (C == 0 && CurPtr-1 == BufferEnd) { // End of file.
       if (!isLexingRawMode())
         Diag(BufferPtr, diag::err_unterminated_raw_string)
-            << StringRef(Prefix, PrefixLen);
-      FormTokenWithChars(Result, CurPtr - 1, tok::unknown);
+          << StringRef(Prefix, PrefixLen);
+      FormTokenWithChars(Result, CurPtr-1, tok::unknown);
       return true;
     }
   }
@@ -2369,7 +2365,7 @@ bool Lexer::LexAngledStringLiteral(Token &Result, const char *CurPtr) {
         FormTokenWithChars(Result, CurPtr - 1, tok::unknown);
         return true;
       }
-      NulCharacter = CurPtr - 1;
+      NulCharacter = CurPtr-1;
     }
     C = getAndAdvanceChar(CurPtr, Result);
   }
@@ -2449,23 +2445,23 @@ bool Lexer::LexCharConstant(Token &Result, const char *CurPtr,
     if (C == '\\')
       C = getAndAdvanceChar(CurPtr, Result);
 
-    if (C == '\n' || C == '\r' ||              // Newline.
-        (C == 0 && CurPtr - 1 == BufferEnd)) { // End of file.
+    if (C == '\n' || C == '\r' ||             // Newline.
+        (C == 0 && CurPtr-1 == BufferEnd)) {  // End of file.
       if (!isLexingRawMode() && !LangOpts.AsmPreprocessor)
         Diag(BufferPtr, diag::ext_unterminated_char_or_string) << 0;
-      FormTokenWithChars(Result, CurPtr - 1, tok::unknown);
+      FormTokenWithChars(Result, CurPtr-1, tok::unknown);
       return true;
     }
 
     if (C == 0) {
-      if (isCodeCompletionPoint(CurPtr - 1)) {
+      if (isCodeCompletionPoint(CurPtr-1)) {
         PP->CodeCompleteNaturalLanguage();
-        FormTokenWithChars(Result, CurPtr - 1, tok::unknown);
+        FormTokenWithChars(Result, CurPtr-1, tok::unknown);
         cutOffLexing();
         return true;
       }
 
-      NulCharacter = CurPtr - 1;
+      NulCharacter = CurPtr-1;
     }
     C = getAndAdvanceChar(CurPtr, Result);
   }
@@ -2619,7 +2615,7 @@ bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
     const char *NextLine = CurPtr;
     if (C != 0) {
       // We found a newline, see if it's escaped.
-      const char *EscapePtr = CurPtr - 1;
+      const char *EscapePtr = CurPtr-1;
       bool HasSpace = false;
       while (isHorizontalWhitespace(*EscapePtr)) { // Skip whitespace.
         --EscapePtr;
@@ -2632,7 +2628,7 @@ bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
       else if (EscapePtr[0] == '/' && EscapePtr[-1] == '?' &&
                EscapePtr[-2] == '?' && LangOpts.Trigraphs)
         // Trigraph-escaped newline.
-        CurPtr = EscapePtr - 2;
+        CurPtr = EscapePtr-2;
       else
         break; // This is a newline, we're done.
 
@@ -2653,7 +2649,7 @@ bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
 
     // If we only read only one character, then no special handling is needed.
     // We're done and can skip forward to the newline.
-    if (C != 0 && CurPtr == OldPtr + 1) {
+    if (C != 0 && CurPtr == OldPtr+1) {
       CurPtr = NextLine;
       break;
     }
@@ -2669,14 +2665,14 @@ bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
           // line is also a // comment, but has spaces, don't emit a diagnostic.
           if (isWhitespace(C)) {
             const char *ForwardPtr = CurPtr;
-            while (isWhitespace(*ForwardPtr)) // Skip whitespace.
+            while (isWhitespace(*ForwardPtr))  // Skip whitespace.
               ++ForwardPtr;
             if (ForwardPtr[0] == '/' && ForwardPtr[1] == '/')
               break;
           }
 
           if (!isLexingRawMode())
-            Diag(OldPtr - 1, diag::ext_multi_line_line_comment);
+            Diag(OldPtr-1, diag::ext_multi_line_line_comment);
           break;
         }
     }
@@ -2686,7 +2682,7 @@ bool Lexer::SkipLineComment(Token &Result, const char *CurPtr,
       break;
     }
 
-    if (C == '\0' && isCodeCompletionPoint(CurPtr - 1)) {
+    if (C == '\0' && isCodeCompletionPoint(CurPtr-1)) {
       PP->CodeCompleteNaturalLanguage();
       cutOffLexing();
       return false;
@@ -2747,12 +2743,12 @@ bool Lexer::SaveLineComment(Token &Result, const char *CurPtr) {
     return true;
 
   assert(Spelling[0] == '/' && Spelling[1] == '/' && "Not line comment?");
-  Spelling[1] = '*'; // Change prefix to "/*".
-  Spelling += "*/";  // add suffix.
+  Spelling[1] = '*';   // Change prefix to "/*".
+  Spelling += "*/";    // add suffix.
 
   Result.setKind(tok::comment);
-  PP->CreateString(Spelling, Result, Result.getLocation(),
-                   Result.getLocation());
+  PP->CreateString(Spelling, Result,
+                   Result.getLocation(), Result.getLocation());
   return true;
 }
 
@@ -2860,7 +2856,7 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
   unsigned CharSize;
   unsigned char C = getCharAndSize(CurPtr, CharSize);
   CurPtr += CharSize;
-  if (C == 0 && CurPtr == BufferEnd + 1) {
+  if (C == 0 && CurPtr == BufferEnd+1) {
     if (!isLexingRawMode())
       Diag(BufferPtr, diag::err_unterminated_block_comment);
     --CurPtr;
@@ -2900,8 +2896,7 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
           goto MultiByteUTF8;
         C = *CurPtr++;
       }
-      if (C == '/')
-        goto FoundSlash;
+      if (C == '/') goto FoundSlash;
 
 #ifdef __SSE2__
       __m128i Slashes = _mm_set1_epi8('/');
@@ -2911,8 +2906,8 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
           goto MultiByteUTF8;
         }
         // look for slashes
-        int cmp = _mm_movemask_epi8(
-            _mm_cmpeq_epi8(*(const __m128i *)CurPtr, Slashes));
+        int cmp = _mm_movemask_epi8(_mm_cmpeq_epi8(*(const __m128i*)CurPtr,
+                                    Slashes));
         if (cmp != 0) {
           // Adjust the pointer to point directly after the first slash. It's
           // not necessary to set C here, it will be overwritten at the end of
@@ -2926,8 +2921,10 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
       __vector unsigned char LongUTF = {0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
                                         0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
                                         0x80, 0x80, 0x80, 0x80};
-      __vector unsigned char Slashes = {'/', '/', '/', '/', '/', '/', '/', '/',
-                                        '/', '/', '/', '/', '/', '/', '/', '/'};
+      __vector unsigned char Slashes = {
+        '/', '/', '/', '/',  '/', '/', '/', '/',
+        '/', '/', '/', '/',  '/', '/', '/', '/'
+      };
       while (CurPtr + 16 < BufferEnd) {
         if (LLVM_UNLIKELY(
                 vec_any_ge(*(const __vector unsigned char *)CurPtr, LongUTF)))
@@ -2986,8 +2983,8 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
     }
 
     if (C == '/') {
-    FoundSlash:
-      if (CurPtr[-2] == '*') // We found the final */.  We're done!
+  FoundSlash:
+      if (CurPtr[-2] == '*')  // We found the final */.  We're done!
         break;
 
       if ((CurPtr[-2] == '\n' || CurPtr[-2] == '\r')) {
@@ -3003,9 +3000,9 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
         // if this is a /*/, which will end the comment.  This misses cases with
         // embedded escaped newlines, but oh well.
         if (!isLexingRawMode())
-          Diag(CurPtr - 1, diag::warn_nested_block_comment);
+          Diag(CurPtr-1, diag::warn_nested_block_comment);
       }
-    } else if (C == 0 && CurPtr == BufferEnd + 1) {
+    } else if (C == 0 && CurPtr == BufferEnd+1) {
       if (!isLexingRawMode())
         Diag(BufferPtr, diag::err_unterminated_block_comment);
       // Note: the user probably forgot a */.  We could continue immediately
@@ -3022,7 +3019,7 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
 
       BufferPtr = CurPtr;
       return false;
-    } else if (C == '\0' && isCodeCompletionPoint(CurPtr - 1)) {
+    } else if (C == '\0' && isCodeCompletionPoint(CurPtr-1)) {
       PP->CodeCompleteNaturalLanguage();
       cutOffLexing();
       return false;
@@ -3050,7 +3047,7 @@ bool Lexer::SkipBlockComment(Token &Result, const char *CurPtr,
   // efficiently now.  This is safe even in KeepWhitespaceMode because we would
   // have already returned above with the comment as a token.
   if (isHorizontalWhitespace(*CurPtr)) {
-    SkipWhitespace(Result, CurPtr + 1, TokAtPhysicalStartOfLine);
+    SkipWhitespace(Result, CurPtr+1, TokAtPhysicalStartOfLine);
     return false;
   }
 
@@ -3081,10 +3078,10 @@ void Lexer::ReadToEndOfLine(SmallVectorImpl<char> *Result) {
       if (Result)
         Result->push_back(Char);
       break;
-    case 0: // Null.
+    case 0:  // Null.
       // Found end of file?
-      if (CurPtr - 1 != BufferEnd) {
-        if (isCodeCompletionPoint(CurPtr - 1)) {
+      if (CurPtr-1 != BufferEnd) {
+        if (isCodeCompletionPoint(CurPtr-1)) {
           PP->CodeCompleteNaturalLanguage();
           cutOffLexing();
           return;
@@ -3101,7 +3098,7 @@ void Lexer::ReadToEndOfLine(SmallVectorImpl<char> *Result) {
     case '\n':
       // Okay, we found the end of the line. First, back up past the \0, \r, \n.
       assert(CurPtr[-1] == Char && "Trigraphs for newline?");
-      BufferPtr = CurPtr - 1;
+      BufferPtr = CurPtr-1;
 
       // Next, lex the character, which should handle the EOD transition.
       Lex(Tmp);
@@ -3135,7 +3132,7 @@ bool Lexer::LexEndOfFile(Token &Result, const char *CurPtr) {
     // Restore comment saving mode, in case it was disabled for directive.
     if (PP)
       resetExtendedTokenMode();
-    return true; // Have a token.
+    return true;  // Have a token.
   }
 
   // If we are in raw mode, return this event as an EOF token.  Let the caller
@@ -3187,7 +3184,8 @@ bool Lexer::LexEndOfFile(Token &Result, const char *CurPtr) {
       DiagID = diag::ext_no_newline_eof;
     }
 
-    Diag(BufferEnd, DiagID) << FixItHint::CreateInsertion(EndLoc, "\n");
+    Diag(BufferEnd, DiagID)
+      << FixItHint::CreateInsertion(EndLoc, "\n");
   }
 
   BufferPtr = CurPtr;
@@ -3251,11 +3249,11 @@ static const char *FindConflictEnd(const char *CurPtr, const char *BufferEnd,
     // Must occur at start of line.
     if (Pos == 0 ||
         (RestOfBuffer[Pos - 1] != '\r' && RestOfBuffer[Pos - 1] != '\n')) {
-      RestOfBuffer = RestOfBuffer.substr(Pos + TermLen);
+      RestOfBuffer = RestOfBuffer.substr(Pos+TermLen);
       Pos = RestOfBuffer.find(Terminator);
       continue;
     }
-    return RestOfBuffer.data() + Pos;
+    return RestOfBuffer.data()+Pos;
   }
   return nullptr;
 }
@@ -3266,7 +3264,8 @@ static const char *FindConflictEnd(const char *CurPtr, const char *BufferEnd,
 /// if not.
 bool Lexer::IsStartOfConflictMarker(const char *CurPtr) {
   // Only a conflict marker if it starts at the beginning of a line.
-  if (CurPtr != BufferStart && CurPtr[-1] != '\n' && CurPtr[-1] != '\r')
+  if (CurPtr != BufferStart &&
+      CurPtr[-1] != '\n' && CurPtr[-1] != '\r')
     return false;
 
   // Check to see if we have <<<<<<< or >>>>.
@@ -3309,7 +3308,8 @@ bool Lexer::IsStartOfConflictMarker(const char *CurPtr) {
 /// the line.  This returns true if it is a conflict marker and false if not.
 bool Lexer::HandleEndOfConflictMarker(const char *CurPtr) {
   // Only a conflict marker if it starts at the beginning of a line.
-  if (CurPtr != BufferStart && CurPtr[-1] != '\n' && CurPtr[-1] != '\r')
+  if (CurPtr != BufferStart &&
+      CurPtr[-1] != '\n' && CurPtr[-1] != '\r')
     return false;
 
   // If we have a situation where we don't care about conflict markers, ignore
@@ -3325,8 +3325,8 @@ bool Lexer::HandleEndOfConflictMarker(const char *CurPtr) {
   // If we do have it, search for the end of the conflict marker.  This could
   // fail if it got skipped with a '#if 0' or something.  Note that CurPtr might
   // be the end of conflict marker.
-  if (const char *End =
-          FindConflictEnd(CurPtr, BufferEnd, CurrentConflictMarkerState)) {
+  if (const char *End = FindConflictEnd(CurPtr, BufferEnd,
+                                        CurrentConflictMarkerState)) {
     CurPtr = End;
 
     // Skip ahead to the end of line.
@@ -3376,7 +3376,7 @@ bool Lexer::lexEditorPlaceholder(Token &Result, const char *CurPtr) {
 
 bool Lexer::isCodeCompletionPoint(const char *CurPtr) const {
   if (PP && PP->isCodeCompletionEnabled()) {
-    SourceLocation Loc = FileLoc.getLocWithOffset(CurPtr - BufferStart);
+    SourceLocation Loc = FileLoc.getLocWithOffset(CurPtr-BufferStart);
     return Loc == PP->getCodeCompletionLoc();
   }
 
@@ -3661,7 +3661,7 @@ bool Lexer::CheckUnicodeWhitespace(Token &Result, uint32_t C,
   if (!isLexingRawMode() && !PP->isPreprocessedOutput() &&
       isUnicodeWhitespace(C)) {
     Diag(BufferPtr, diag::ext_unicode_whitespace)
-        << makeCharRange(*this, BufferPtr, CurPtr);
+      << makeCharRange(*this, BufferPtr, CurPtr);
 
     Result.setFlag(Token::LeadingSpace);
     return true;
@@ -3701,7 +3701,7 @@ bool Lexer::Lex(Token &Result) {
   bool atPhysicalStartOfLine = IsAtPhysicalStartOfLine;
   IsAtPhysicalStartOfLine = false;
   bool isRawLex = isLexingRawMode();
-  (void)isRawLex;
+  (void) isRawLex;
   bool returnedToken = LexTokenInternal(Result, atPhysicalStartOfLine);
   // (After the LexTokenInternal call, the lexer might be destroyed.)
   assert((returnedToken || !isRawLex) && "Raw lex must succeed");
@@ -3740,7 +3740,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     Result.setFlag(Token::LeadingSpace);
   }
 
-  unsigned SizeTmp, SizeTmp2; // Temporaries for use in cases below.
+  unsigned SizeTmp, SizeTmp2;   // Temporaries for use in cases below.
 
   // Read a character, advancing over it.
   char Char = getAndAdvanceChar(CurPtr, Result);
@@ -3750,13 +3750,13 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     NewLinePtr = nullptr;
 
   switch (Char) {
-  case 0: // Null.
+  case 0:  // Null.
     // Found end of file?
-    if (CurPtr - 1 == BufferEnd)
-      return LexEndOfFile(Result, CurPtr - 1);
+    if (CurPtr-1 == BufferEnd)
+      return LexEndOfFile(Result, CurPtr-1);
 
     // Check if we are performing code completion.
-    if (isCodeCompletionPoint(CurPtr - 1)) {
+    if (isCodeCompletionPoint(CurPtr-1)) {
       // Return the code-completion token.
       Result.startToken();
       FormTokenWithChars(Result, CurPtr, tok::code_completion);
@@ -3764,7 +3764,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     }
 
     if (!isLexingRawMode())
-      Diag(CurPtr - 1, diag::null_in_file);
+      Diag(CurPtr-1, diag::null_in_file);
     Result.setFlag(Token::LeadingSpace);
     if (SkipWhitespace(Result, CurPtr, TokAtPhysicalStartOfLine))
       return true; // KeepWhitespaceMode
@@ -3773,12 +3773,12 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // (We manually eliminate the tail call to avoid recursion.)
     goto LexNextToken;
 
-  case 26: // DOS & CP/M EOF: "^Z".
+  case 26:  // DOS & CP/M EOF: "^Z".
     // If we're in Microsoft extensions mode, treat this as end of file.
     if (LangOpts.MicrosoftExt) {
       if (!isLexingRawMode())
-        Diag(CurPtr - 1, diag::ext_ctrl_z_eof_microsoft);
-      return LexEndOfFile(Result, CurPtr - 1);
+        Diag(CurPtr-1, diag::ext_ctrl_z_eof_microsoft);
+      return LexEndOfFile(Result, CurPtr-1);
     }
 
     // If Microsoft extensions are disabled, this is just random garbage.
@@ -3834,11 +3834,11 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // too (without going through the big switch stmt).
     if (CurPtr[0] == '/' && CurPtr[1] == '/' && !inKeepCommentMode() &&
         LineComment && (LangOpts.CPlusPlus || !LangOpts.TraditionalCPP)) {
-      if (SkipLineComment(Result, CurPtr + 2, TokAtPhysicalStartOfLine))
+      if (SkipLineComment(Result, CurPtr+2, TokAtPhysicalStartOfLine))
         return true; // There is a token to return.
       goto SkipIgnoredUnits;
     } else if (CurPtr[0] == '/' && CurPtr[1] == '*' && !inKeepCommentMode()) {
-      if (SkipBlockComment(Result, CurPtr + 2, TokAtPhysicalStartOfLine))
+      if (SkipBlockComment(Result, CurPtr+2, TokAtPhysicalStartOfLine))
         return true; // There is a token to return.
       goto SkipIgnoredUnits;
     } else if (isHorizontalWhitespace(*CurPtr)) {
@@ -3850,16 +3850,8 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
 
   // C99 6.4.4.1: Integer Constants.
   // C99 6.4.4.2: Floating Constants.
-  case '0':
-  case '1':
-  case '2':
-  case '3':
-  case '4':
-  case '5':
-  case '6':
-  case '7':
-  case '8':
-  case '9':
+  case '0': case '1': case '2': case '3': case '4':
+  case '5': case '6': case '7': case '8': case '9':
     // Notify MIOpt that we read a non-whitespace/non-comment token.
     MIOpt.ReadToken();
     return LexNumericConstant(Result, CurPtr);
@@ -3887,26 +3879,24 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       // UTF-16 raw string literal
       if (Char == 'R' && LangOpts.RawStringLiterals &&
           getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '"')
-        return LexRawStringLiteral(
-            Result,
-            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result),
-            tok::utf16_string_literal);
+        return LexRawStringLiteral(Result,
+                               ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                                           SizeTmp2, Result),
+                               tok::utf16_string_literal);
 
       if (Char == '8') {
         char Char2 = getCharAndSize(CurPtr + SizeTmp, SizeTmp2);
 
         // UTF-8 string literal
         if (Char2 == '"')
-          return LexStringLiteral(
-              Result,
-              ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2,
-                          Result),
-              tok::utf8_string_literal);
+          return LexStringLiteral(Result,
+                               ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                                           SizeTmp2, Result),
+                               tok::utf8_string_literal);
         if (Char2 == '\'' && (LangOpts.CPlusPlus17 || LangOpts.C23))
           return LexCharConstant(
-              Result,
-              ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2,
-                          Result),
+              Result, ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                                  SizeTmp2, Result),
               tok::utf8_char_constant);
 
         if (Char2 == 'R' && LangOpts.RawStringLiterals) {
@@ -3914,12 +3904,11 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
           char Char3 = getCharAndSize(CurPtr + SizeTmp + SizeTmp2, SizeTmp3);
           // UTF-8 raw string literal
           if (Char3 == '"') {
-            return LexRawStringLiteral(
-                Result,
-                ConsumeChar(ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
-                                        SizeTmp2, Result),
-                            SizeTmp3, Result),
-                tok::utf8_string_literal);
+            return LexRawStringLiteral(Result,
+                   ConsumeChar(ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                                           SizeTmp2, Result),
+                               SizeTmp3, Result),
+                   tok::utf8_string_literal);
           }
         }
       }
@@ -3948,10 +3937,10 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       // UTF-32 raw string literal
       if (Char == 'R' && LangOpts.RawStringLiterals &&
           getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '"')
-        return LexRawStringLiteral(
-            Result,
-            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result),
-            tok::utf32_string_literal);
+        return LexRawStringLiteral(Result,
+                               ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                                           SizeTmp2, Result),
+                               tok::utf32_string_literal);
     }
 
     // treat U like the start of an identifier.
@@ -3965,14 +3954,15 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       Char = getCharAndSize(CurPtr, SizeTmp);
 
       if (Char == '"')
-        return LexRawStringLiteral(Result, ConsumeChar(CurPtr, SizeTmp, Result),
+        return LexRawStringLiteral(Result,
+                                   ConsumeChar(CurPtr, SizeTmp, Result),
                                    tok::string_literal);
     }
 
     // treat R like the start of an identifier.
     return LexIdentifierContinue(Result, CurPtr);
 
-  case 'L': // Identifier (Loony) or wide literal (L'x' or L"xyz").
+  case 'L':   // Identifier (Loony) or wide literal (L'x' or L"xyz").
     // Notify MIOpt that we read a non-whitespace/non-comment token.
     MIOpt.ReadToken();
     Char = getCharAndSize(CurPtr, SizeTmp);
@@ -3985,10 +3975,10 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // Wide raw string literal.
     if (LangOpts.RawStringLiterals && Char == 'R' &&
         getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '"')
-      return LexRawStringLiteral(
-          Result,
-          ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result),
-          tok::wide_string_literal);
+      return LexRawStringLiteral(Result,
+                               ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                                           SizeTmp2, Result),
+                               tok::wide_string_literal);
 
     // Wide character constant.
     if (Char == '\'')
@@ -3998,63 +3988,23 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     [[fallthrough]];
 
   // C99 6.4.2: Identifiers.
-  case 'A':
-  case 'B':
-  case 'C':
-  case 'D':
-  case 'E':
-  case 'F':
-  case 'G':
-  case 'H':
-  case 'I':
-  case 'J':
-  case 'K': /*'L'*/
-  case 'M':
-  case 'N':
-  case 'O':
-  case 'P':
-  case 'Q': /*'R'*/
-  case 'S':
-  case 'T': /*'U'*/
-  case 'V':
-  case 'W':
-  case 'X':
-  case 'Y':
-  case 'Z':
-  case 'a':
-  case 'b':
-  case 'c':
-  case 'd':
-  case 'e':
-  case 'f':
-  case 'g':
-  case 'h':
-  case 'i':
-  case 'j':
-  case 'k':
-  case 'l':
-  case 'm':
-  case 'n':
-  case 'o':
-  case 'p':
-  case 'q':
-  case 'r':
-  case 's':
-  case 't': /*'u'*/
-  case 'v':
-  case 'w':
-  case 'x':
-  case 'y':
-  case 'z':
+  case 'A': case 'B': case 'C': case 'D': case 'E': case 'F': case 'G':
+  case 'H': case 'I': case 'J': case 'K':    /*'L'*/case 'M': case 'N':
+  case 'O': case 'P': case 'Q':    /*'R'*/case 'S': case 'T':    /*'U'*/
+  case 'V': case 'W': case 'X': case 'Y': case 'Z':
+  case 'a': case 'b': case 'c': case 'd': case 'e': case 'f': case 'g':
+  case 'h': case 'i': case 'j': case 'k': case 'l': case 'm': case 'n':
+  case 'o': case 'p': case 'q': case 'r': case 's': case 't':    /*'u'*/
+  case 'v': case 'w': case 'x': case 'y': case 'z':
   case '_':
     // Notify MIOpt that we read a non-whitespace/non-comment token.
     MIOpt.ReadToken();
     return LexIdentifierContinue(Result, CurPtr);
 
-  case '$': // $ in identifiers.
+  case '$':   // $ in identifiers.
     if (LangOpts.DollarIdents) {
       if (!isLexingRawMode())
-        Diag(CurPtr - 1, diag::ext_dollar_in_identifier);
+        Diag(CurPtr-1, diag::ext_dollar_in_identifier);
       // Notify MIOpt that we read a non-whitespace/non-comment token.
       MIOpt.ReadToken();
       return LexIdentifierContinue(Result, CurPtr);
@@ -4110,10 +4060,10 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       Kind = tok::periodstar;
       CurPtr += SizeTmp;
     } else if (Char == '.' &&
-               getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '.') {
+               getCharAndSize(CurPtr+SizeTmp, SizeTmp2) == '.') {
       Kind = tok::ellipsis;
-      CurPtr =
-          ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
+      CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                           SizeTmp2, Result);
     } else {
       Kind = tok::period;
     }
@@ -4152,18 +4102,18 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     break;
   case '-':
     Char = getCharAndSize(CurPtr, SizeTmp);
-    if (Char == '-') { // --
+    if (Char == '-') {      // --
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::minusminus;
     } else if (Char == '>' && LangOpts.CPlusPlus &&
-               getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == '*') { // C++ ->*
-      CurPtr =
-          ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
+               getCharAndSize(CurPtr+SizeTmp, SizeTmp2) == '*') {  // C++ ->*
+      CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                           SizeTmp2, Result);
       Kind = tok::arrowstar;
-    } else if (Char == '>') { // ->
+    } else if (Char == '>') {   // ->
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::arrow;
-    } else if (Char == '=') { // -=
+    } else if (Char == '=') {   // -=
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::minusequal;
     } else {
@@ -4184,7 +4134,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
   case '/':
     // 6.4.9: Comments
     Char = getCharAndSize(CurPtr, SizeTmp);
-    if (Char == '/') { // Line comment.
+    if (Char == '/') {         // Line comment.
       // Even if Line comments are disabled (e.g. in C89 mode), we generally
       // want to lex this as a comment.  There is one problem with this though,
       // that in one particular corner case, this can change the behavior of the
@@ -4197,7 +4147,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
           LineComment && (LangOpts.CPlusPlus || !LangOpts.TraditionalCPP);
       if (!TreatAsComment)
         if (!(PP && PP->isPreprocessedOutput()))
-          TreatAsComment = getCharAndSize(CurPtr + SizeTmp, SizeTmp2) != '*';
+          TreatAsComment = getCharAndSize(CurPtr+SizeTmp, SizeTmp2) != '*';
 
       if (TreatAsComment) {
         if (SkipLineComment(Result, ConsumeChar(CurPtr, SizeTmp, Result),
@@ -4211,7 +4161,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       }
     }
 
-    if (Char == '*') { // /**/ comment.
+    if (Char == '*') {  // /**/ comment.
       if (SkipBlockComment(Result, ConsumeChar(CurPtr, SizeTmp, Result),
                            TokAtPhysicalStartOfLine))
         return true; // There is a token to return.
@@ -4234,21 +4184,21 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       Kind = tok::percentequal;
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
     } else if (LangOpts.Digraphs && Char == '>') {
-      Kind = tok::r_brace; // '%>' -> '}'
+      Kind = tok::r_brace;                             // '%>' -> '}'
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
     } else if (LangOpts.Digraphs && Char == ':') {
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Char = getCharAndSize(CurPtr, SizeTmp);
-      if (Char == '%' && getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == ':') {
-        Kind = tok::hashhash; // '%:%:' -> '##'
-        CurPtr =
-            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
-      } else if (Char == '@' && LangOpts.MicrosoftExt) { // %:@ -> #@ -> Charize
+      if (Char == '%' && getCharAndSize(CurPtr+SizeTmp, SizeTmp2) == ':') {
+        Kind = tok::hashhash;                          // '%:%:' -> '##'
+        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                             SizeTmp2, Result);
+      } else if (Char == '@' && LangOpts.MicrosoftExt) {// %:@ -> #@ -> Charize
         CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
         if (!isLexingRawMode())
           Diag(BufferPtr, diag::ext_charize_microsoft);
         Kind = tok::hashat;
-      } else { // '%:' -> '#'
+      } else {                                         // '%:' -> '#'
         // We parsed a # character.  If this occurs at the start of the line,
         // it's actually the start of a preprocessing directive.  Callback to
         // the preprocessor to handle it.
@@ -4267,35 +4217,35 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     if (ParsingFilename) {
       return LexAngledStringLiteral(Result, CurPtr);
     } else if (Char == '<') {
-      char After = getCharAndSize(CurPtr + SizeTmp, SizeTmp2);
+      char After = getCharAndSize(CurPtr+SizeTmp, SizeTmp2);
       if (After == '=') {
         Kind = tok::lesslessequal;
-        CurPtr =
-            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
-      } else if (After == '<' && IsStartOfConflictMarker(CurPtr - 1)) {
+        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                             SizeTmp2, Result);
+      } else if (After == '<' && IsStartOfConflictMarker(CurPtr-1)) {
         // If this is actually a '<<<<<<<' version control conflict marker,
         // recognize it as such and recover nicely.
         goto LexNextToken;
-      } else if (After == '<' && HandleEndOfConflictMarker(CurPtr - 1)) {
+      } else if (After == '<' && HandleEndOfConflictMarker(CurPtr-1)) {
         // If this is '<<<<' and we're in a Perforce-style conflict marker,
         // ignore it.
         goto LexNextToken;
       } else if (LangOpts.CUDA && After == '<') {
         Kind = tok::lesslessless;
-        CurPtr =
-            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
+        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                             SizeTmp2, Result);
       } else {
         CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
         Kind = tok::lessless;
       }
     } else if (Char == '=') {
-      char After = getCharAndSize(CurPtr + SizeTmp, SizeTmp2);
+      char After = getCharAndSize(CurPtr+SizeTmp, SizeTmp2);
       if (After == '>') {
         if (LangOpts.CPlusPlus20) {
           if (!isLexingRawMode())
             Diag(BufferPtr, diag::warn_cxx17_compat_spaceship);
-          CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2,
-                               Result);
+          CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                               SizeTmp2, Result);
           Kind = tok::spaceship;
           break;
         }
@@ -4303,13 +4253,13 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
         // change in semantics if this turns up in C++ <=17 mode.
         if (LangOpts.CPlusPlus && !isLexingRawMode()) {
           Diag(BufferPtr, diag::warn_cxx20_compat_spaceship)
-              << FixItHint::CreateInsertion(
-                     getSourceLocation(CurPtr + SizeTmp, SizeTmp2), " ");
+            << FixItHint::CreateInsertion(
+                   getSourceLocation(CurPtr + SizeTmp, SizeTmp2), " ");
         }
       }
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::lessequal;
-    } else if (LangOpts.Digraphs && Char == ':') { // '<:' -> '['
+    } else if (LangOpts.Digraphs && Char == ':') {     // '<:' -> '['
       if (LangOpts.CPlusPlus11 &&
           getCharAndSize(CurPtr + SizeTmp, SizeTmp2) == ':') {
         // C++0x [lex.pptoken]p3:
@@ -4329,7 +4279,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
 
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::l_square;
-    } else if (LangOpts.Digraphs && Char == '%') { // '<%' -> '{'
+    } else if (LangOpts.Digraphs && Char == '%') {     // '<%' -> '{'
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::l_brace;
     } else if (Char == '#' && /*Not a trigraph*/ SizeTmp == 1 &&
@@ -4345,22 +4295,22 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
       Kind = tok::greaterequal;
     } else if (Char == '>') {
-      char After = getCharAndSize(CurPtr + SizeTmp, SizeTmp2);
+      char After = getCharAndSize(CurPtr+SizeTmp, SizeTmp2);
       if (After == '=') {
-        CurPtr =
-            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
+        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                             SizeTmp2, Result);
         Kind = tok::greatergreaterequal;
-      } else if (After == '>' && IsStartOfConflictMarker(CurPtr - 1)) {
+      } else if (After == '>' && IsStartOfConflictMarker(CurPtr-1)) {
         // If this is actually a '>>>>' conflict marker, recognize it as such
         // and recover nicely.
         goto LexNextToken;
-      } else if (After == '>' && HandleEndOfConflictMarker(CurPtr - 1)) {
+      } else if (After == '>' && HandleEndOfConflictMarker(CurPtr-1)) {
         // If this is '>>>>>>>' and we're in a conflict marker, ignore it.
         goto LexNextToken;
       } else if (LangOpts.CUDA && After == '>') {
         Kind = tok::greatergreatergreater;
-        CurPtr =
-            ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result), SizeTmp2, Result);
+        CurPtr = ConsumeChar(ConsumeChar(CurPtr, SizeTmp, Result),
+                             SizeTmp2, Result);
       } else {
         CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
         Kind = tok::greatergreater;
@@ -4387,7 +4337,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
     } else if (Char == '|') {
       // If this is '|||||||' and we're in a conflict marker, ignore it.
-      if (CurPtr[1] == '|' && HandleEndOfConflictMarker(CurPtr - 1))
+      if (CurPtr[1] == '|' && HandleEndOfConflictMarker(CurPtr-1))
         goto LexNextToken;
       Kind = tok::pipepipe;
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
@@ -4414,7 +4364,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     Char = getCharAndSize(CurPtr, SizeTmp);
     if (Char == '=') {
       // If this is '====' and we're in a conflict marker, ignore it.
-      if (CurPtr[1] == '=' && HandleEndOfConflictMarker(CurPtr - 1))
+      if (CurPtr[1] == '=' && HandleEndOfConflictMarker(CurPtr-1))
         goto LexNextToken;
 
       Kind = tok::equalequal;
@@ -4431,7 +4381,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     if (Char == '#') {
       Kind = tok::hashhash;
       CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
-    } else if (Char == '@' && LangOpts.MicrosoftExt) { // #@ -> Charize
+    } else if (Char == '@' && LangOpts.MicrosoftExt) {  // #@ -> Charize
       Kind = tok::hashat;
       if (!isLexingRawMode())
         Diag(BufferPtr, diag::ext_charize_microsoft);
@@ -4487,9 +4437,11 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // We can't just reset CurPtr to BufferPtr because BufferPtr may point to
     // an escaped newline.
     --CurPtr;
-    llvm::ConversionResult Status = llvm::convertUTF8Sequence(
-        (const llvm::UTF8 **)&CurPtr, (const llvm::UTF8 *)BufferEnd, &CodePoint,
-        llvm::strictConversion);
+    llvm::ConversionResult Status =
+        llvm::convertUTF8Sequence((const llvm::UTF8 **)&CurPtr,
+                                  (const llvm::UTF8 *)BufferEnd,
+                                  &CodePoint,
+                                  llvm::strictConversion);
     if (Status == llvm::conversionOK) {
       if (CheckUnicodeWhitespace(Result, CodePoint, CurPtr)) {
         if (SkipWhitespace(Result, CurPtr, TokAtPhysicalStartOfLine))
@@ -4514,7 +4466,7 @@ bool Lexer::LexTokenInternal(Token &Result, bool TokAtPhysicalStartOfLine) {
     // just diagnose the invalid UTF-8, then drop the character.
     Diag(CurPtr, diag::err_invalid_utf8);
 
-    BufferPtr = CurPtr + 1;
+    BufferPtr = CurPtr+1;
     // We're pretending the character didn't exist, so just try again with
     // this lexer.
     // (We manually eliminate the tail call to avoid recursion.)

>From da1df1c17448763b4b3a83f94555e7a65ab68304 Mon Sep 17 00:00:00 2001
From: Samira Bazuzi <bazuzi at google.com>
Date: Wed, 27 Nov 2024 10:07:31 -0500
Subject: [PATCH 4/4] Add test for getRawToken.

---
 clang/unittests/Lex/LexerTest.cpp | 32 +++++++++++++++++++++++++++++++
 1 file changed, 32 insertions(+)

diff --git a/clang/unittests/Lex/LexerTest.cpp b/clang/unittests/Lex/LexerTest.cpp
index 47aa2c131a304d..aead7fb899d0a8 100644
--- a/clang/unittests/Lex/LexerTest.cpp
+++ b/clang/unittests/Lex/LexerTest.cpp
@@ -652,6 +652,38 @@ TEST_F(LexerTest, RawAndNormalLexSameForLineComments) {
   EXPECT_TRUE(ToksView.empty());
 }
 
+TEST_F(LexerTest, GetRawTokenOnEscapedNewLineChecksWhitespace) {
+  const llvm::StringLiteral Source = R"cc(
+  #define ONE \
+  1
+
+  int i = ONE;
+  )cc";
+  std::vector<Token> Toks =
+      CheckLex(Source, {tok::kw_int, tok::identifier, tok::equal,
+                        tok::numeric_constant, tok::semi});
+
+  // Set up by getting the raw token for the `1` in the macro definition.
+  const Token &OneExpanded = Toks[3];
+  Token Tok;
+  ASSERT_FALSE(
+      Lexer::getRawToken(OneExpanded.getLocation(), Tok, SourceMgr, LangOpts));
+  // The `ONE`.
+  ASSERT_EQ(Tok.getKind(), tok::raw_identifier);
+  ASSERT_FALSE(
+      Lexer::getRawToken(SourceMgr.getSpellingLoc(OneExpanded.getLocation()),
+                         Tok, SourceMgr, LangOpts));
+  // The `1` in the macro definition.
+  ASSERT_EQ(Tok.getKind(), tok::numeric_constant);
+
+  // Go back 4 characters: two spaces, one newline, and the backslash.
+  SourceLocation EscapedNewLineLoc = Tok.getLocation().getLocWithOffset(-4);
+  // Expect true (=failure) because the whitespace immediately after the
+  // escaped newline is not ignored.
+  EXPECT_TRUE(Lexer::getRawToken(EscapedNewLineLoc, Tok, SourceMgr, LangOpts,
+                                 /*IgnoreWhiteSpace=*/false));
+}
+
 TEST(LexerPreambleTest, PreambleBounds) {
   std::vector<std::string> Cases = {
       R"cc([[



More information about the cfe-commits mailing list