[PATCH] D41423: [Lex] Avoid out-of-bounds dereference in LexAngledStringLiteral.
Richard Smith - zygoloid via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Thu Jan 11 13:29:01 PST 2018
rsmith accepted this revision.
rsmith added inline comments.
================
Comment at: clang/lib/Lex/Lexer.cpp:2012-2015
+ // Skip escaped characters. Escaped newlines will already be processed by
+ // getAndAdvanceChar.
+ if (C == '\\')
+ C = getAndAdvanceChar(CurPtr, Result);
----------------
vsapsai wrote:
> rsmith wrote:
> > You can just delete these four lines entirely.
> It will make Clang reject previously accepted `#include <test\>escape.h>` That's what we want, right? I agree with having no support for such file names, just want to confirm. For the reference, proposed change would match gcc 5.4.0 behaviour. gcc 6.1 and higher rejects such include too but in a different way.
Hmm. I thought so, but on further inspection this area is more subtle than I'd anticipated.
It's *permissible* to reject that. In C++, [lex.header]p2 says: "The appearance of [...] the character[...] `\` [...] in a //q-char-sequence// or an //h-char-sequence// is conditionally-supported with implementation-defined semantics". C11 6.4.7/3 has similar wording but makes the behavior undefined in that case. (Both GCC's behavior and Clang's behavior are permitted by these rules.)
Clang's handling of the `#include "q-char-sequence"` form behaves the same as its handling of the `#include <h-char-sequence>` form today: the header name is *not* terminated by a `"` or `>` (respectively) that is preceded by an odd number of `\`s. Retaining this consistency of behavior seems like a good idea.
However, the actual behavior when such a character is "escaped" does not seem reasonable:
```
#include <\>> // includes file named \>, not file named >
#include "\"" // includes file named \", not file named "
```
... and there is no general way to perform a double-quoted include of a file whose name contains a `"` (not preceded by `\`), or an angled include of a file whose name is `>` (not preceded by `>`). So it seems that permitting escaped ending characters without actually handling them as escape sequences is not particularly worthwhile.
I think the best approach here is to follow GCC's lead: terminate a //header-name// (of either form) when the first `"` or `>` character is reached (depending on the form of header name), regardless of preceding `\`s.
But... that's a more invasive change (you'll need to change the handling of double-quoted string literals too, for consistency). Let's keep that change and this one separate.
https://reviews.llvm.org/D41423
More information about the cfe-commits
mailing list