[PATCH] D30748: [Lexer] Finding beginning of token with escaped new line

Alexander Kornienko via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Mar 16 06:00:43 PDT 2017


alexfh requested changes to this revision.
alexfh added inline comments.
This revision now requires changes to proceed.


================
Comment at: lib/Lex/Lexer.cpp:457
+static bool isNewLineEscaped(const char *BufferStart, const char *Str) {
+  while (Str > BufferStart && isWhitespace(*Str))
+    --Str;
----------------
idlecode wrote:
> alexfh wrote:
> > We only care about two specific sequences here: `\\\r\n` or `\\\n`, not a backslash followed by arbitrary whitespace.
> I just saw that some functions (e.g. line 1285 in this file) accept whitespaces between escape character and new line. How about now?
Indeed, both clang and gcc accept whitespace between the backslash and the newline character and issue a diagnostic: https://godbolt.org/g/PUCTzF.

This should probably be done similar to Lexer::getEscapedNewLineSize, but in reverse:

  assert(isVerticalWhitespace(*P));
  --P;
  if (P >= BufferStart && isVerticalWhitespace(*P) && *P != P[1]) // Skip the second character of `\r\n` or `\n\r`.
    --P;
  // Clang allows horizontal whitespace between backslash and new-line with a warning. Skip it.
  while (P >= BufferStart && isHorizontalWhitespace(*P))
    --P;
  return P >= BufferStart && *P == '\\';

I'd add a bunch of tests for this function specifically:
  <backslash><\r> -> true
  <backslash><\n> -> true
  <backslash><\r><\n> -> true
  <backslash><\n><\r> -> true
  <backslash><space><tab><\v><\f><\r> -> true
  <backslash><space><tab><\v><\f><\r><\n> -> true
  <backslash><\r><\r> -> false
  <backslash><\r><\r><\n> -> false
  <backslash><\n><\n> -> false



https://reviews.llvm.org/D30748





More information about the cfe-commits mailing list