[PATCH] D30748: [Lexer] Finding beginning of token with escaped new line
    Alexander Kornienko via Phabricator via cfe-commits 
    cfe-commits at lists.llvm.org
       
    Thu Mar 16 06:00:43 PDT 2017
    
    
  
alexfh requested changes to this revision.
alexfh added inline comments.
This revision now requires changes to proceed.
================
Comment at: lib/Lex/Lexer.cpp:457
+static bool isNewLineEscaped(const char *BufferStart, const char *Str) {
+  while (Str > BufferStart && isWhitespace(*Str))
+    --Str;
----------------
idlecode wrote:
> alexfh wrote:
> > We only care about two specific sequences here: `\\\r\n` or `\\\n`, not a backslash followed by arbitrary whitespace.
> I just saw that some functions (e.g. line 1285 in this file) accept whitespaces between escape character and new line. How about now?
Indeed, both clang and gcc accept whitespace between the backslash and the newline character and issue a diagnostic: https://godbolt.org/g/PUCTzF.
This should probably be done similar to Lexer::getEscapedNewLineSize, but in reverse:
  assert(isVerticalWhitespace(*P));
  --P;
  if (P >= BufferStart && isVerticalWhitespace(*P) && *P != P[1]) // Skip the second character of `\r\n` or `\n\r`.
    --P;
  // Clang allows horizontal whitespace between backslash and new-line with a warning. Skip it.
  while (P >= BufferStart && isHorizontalWhitespace(*P))
    --P;
  return P >= BufferStart && *P == '\\';
I'd add a bunch of tests for this function specifically:
  <backslash><\r> -> true
  <backslash><\n> -> true
  <backslash><\r><\n> -> true
  <backslash><\n><\r> -> true
  <backslash><space><tab><\v><\f><\r> -> true
  <backslash><space><tab><\v><\f><\r><\n> -> true
  <backslash><\r><\r> -> false
  <backslash><\r><\r><\n> -> false
  <backslash><\n><\n> -> false
https://reviews.llvm.org/D30748
    
    
More information about the cfe-commits
mailing list