[PATCH] D30748: [Lexer] Finding beginning of token with escaped new line
Alexander Kornienko via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Mar 16 06:00:43 PDT 2017
alexfh requested changes to this revision.
alexfh added inline comments.
This revision now requires changes to proceed.
================
Comment at: lib/Lex/Lexer.cpp:457
+static bool isNewLineEscaped(const char *BufferStart, const char *Str) {
+ while (Str > BufferStart && isWhitespace(*Str))
+ --Str;
----------------
idlecode wrote:
> alexfh wrote:
> > We only care about two specific sequences here: `\\\r\n` or `\\\n`, not a backslash followed by arbitrary whitespace.
> I just saw that some functions (e.g. line 1285 in this file) accept whitespaces between escape character and new line. How about now?
Indeed, both clang and gcc accept whitespace between the backslash and the newline character and issue a diagnostic: https://godbolt.org/g/PUCTzF.
This should probably be done similar to Lexer::getEscapedNewLineSize, but in reverse:
assert(isVerticalWhitespace(*P));
--P;
if (P >= BufferStart && isVerticalWhitespace(*P) && *P != P[1]) // Skip the second character of `\r\n` or `\n\r`.
--P;
// Clang allows horizontal whitespace between backslash and new-line with a warning. Skip it.
while (P >= BufferStart && isHorizontalWhitespace(*P))
--P;
return P >= BufferStart && *P == '\\';
I'd add a bunch of tests for this function specifically:
<backslash><\r> -> true
<backslash><\n> -> true
<backslash><\r><\n> -> true
<backslash><\n><\r> -> true
<backslash><space><tab><\v><\f><\r> -> true
<backslash><space><tab><\v><\f><\r><\n> -> true
<backslash><\r><\r> -> false
<backslash><\r><\r><\n> -> false
<backslash><\n><\n> -> false
https://reviews.llvm.org/D30748
More information about the cfe-commits
mailing list