[cfe-dev] Problem in locations

Wed Aug 12 23:51:39 PDT 2009

Chris Lattner ha scritto:
> 
> On Aug 12, 2009, at 11:14 PM, Abramo Bagnara wrote:
> 
>>>>
>>>> IMHO to have a leading \newline as part of the token confuses the
>>>> diagnostic without benefits.
>>>> int p() {
>>>> for ( \
>>>> int i = 0; i < 10; ++i)
>>>>   ;
>>>> return 0;
>>>> }
>>>
>>> That is perhaps not the best quality of implementation for the
>>> diagnostic, but it is intended.  You're hitting issues that are due to
>>> the phases of translation in C.  The first phase removes escaped
>>> newlines (which, as a gnu extension, can be followed by horizontal
>>> whitespace... urg) and trigraphs.  Because the lexer fully integrates
>>> the various phases of translation, a source location for a token returns
>>> the first byte of the file that is part of that token.  In this case, it
>>> is the escaped newline.
>>
>> Why the escaped newline is not considered the last part of the
>> whitespace?
> 
> What do you mean?  Escaped newline can occur anywhere in a token, the
> start of the token isn't special:
> 
> foo\
> bar??/
> baz
> 
> is one token.

Of course, but I believe that a leading ignorable should be ignored, not
included.

Take as an example
foo \
bar\
baz

there are two tokens "foo" and "barbaz", the text associated could be
"foo" and "bar\
baz" or
"foo" and "\
bar\
baz".

I don't see the reason to prefer the latter to the former. I'd think
that to consider " \
" and not only " " as whitespace is a more rational alternative.

Take also in consideration that the standard says that the preprocessing
token are identified after escaped newline removal.