<div dir="ltr"><div class="gmail_extra"><div class="gmail_quote">On Mon, Feb 24, 2014 at 3:09 PM, Harald van Dijk <span dir="ltr"><<a href="mailto:harald@gigawatt.nl" target="_blank">harald@gigawatt.nl</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="">On 24/02/14 21:58, Richard Smith wrote:<br>

> On Sun, Feb 9, 2014 at 1:01 AM, Harald van Dijk <<a href="mailto:harald@gigawatt.nl">harald@gigawatt.nl</a><br>

</div><div class="">> <mailto:<a href="mailto:harald@gigawatt.nl">harald@gigawatt.nl</a>>> wrote:<br>

><br>

>     On 05/02/14 23:16, Richard Smith wrote:<br>

>     > On Wed, Feb 5, 2014 at 1:08 PM, Harald van Dijk<br>

>     <<a href="mailto:harald@gigawatt.nl">harald@gigawatt.nl</a> <mailto:<a href="mailto:harald@gigawatt.nl">harald@gigawatt.nl</a>><br>

</div><div><div class="h5">>     > <mailto:<a href="mailto:harald@gigawatt.nl">harald@gigawatt.nl</a> <mailto:<a href="mailto:harald@gigawatt.nl">harald@gigawatt.nl</a>>>> wrote:<br>

>     ><br>

>     >     On 05/02/14 00:42, Richard Smith wrote:<br>

>     >     > Here's an easy test:<br>

>     >     ><br>

>     >     > #define foo() bar() b<br>

>     >     > #define bar()<br>

>     >     > a<br>

>     >     > foo()<br>

>     >     > c<br>

>     >     ><br>

>     >     > This should produce:<br>

>     >     ><br>

>     >     > # 1 "..."<br>

>     >     ><br>

>     >     ><br>

>     >     > a<br>

>     >     >  b<br>

>     >     > c<br>

>     >     ><br>

>     >     > ... but currently produces:<br>

>     >     ><br>

>     >     > #1 "..."<br>

>     >     ><br>

>     >     ><br>

>     >     > a b<br>

>     >     ><br>

>     >     > c<br>

>     ><br>

>     >     Thanks Richard, I don't know how I managed to miss that.<br>

>     ><br>

>     >     Setting Token::StartOfLine as mentioned gets this right in the<br>

>     lexer.<br>

>     >     However, even though the b token in the output has both<br>

>     >     Token::StartOfLine and Token::LeadingSpace set,<br>

>     >     PrintPPOutputPPCallbacks::HandleFirstTokOnLine uses the column<br>

>     number to<br>

>     >     determine whether to print spaces, not the flag. As a result,<br>

>     the output<br>

>     >     becomes not<br>

>     ><br>

>     >     a<br>

>     >      b<br>

>     >     c<br>

>     ><br>

>     >     but<br>

>     ><br>

>     >     a<br>

>     >     b<br>

>     >     c<br>

>     ><br>

>     >     Given that the lexer behaviour is now right, would it be okay<br>

>     to leave<br>

>     >     this as it is, and only add a test to check that b is on a<br>

>     separate<br>

>     >     line, like attached? (I used first/second/third instead of<br>

>     a/b/c in an<br>

>     >     attempt to get FileCheck to print a better suggestion when the<br>

>     test<br>

>     >     fails, but it neither helps nor hurts.)<br>

>     ><br>

>     ><br>

>     > I'm not sure whether we have consumers of -E that care about this<br>

>     > detail, but I think this is an unrelated bug. Consider:<br>

>     ><br>

>     > #define foo(x) x y<br>

>     > foo()<br>

>     ><br>

>     > This also misses the leading space from the output. So:<br>

>     ><br>

>     > 1) I think your patch is fine as-is, and<br>

>     > 2) I think we should fix PrintPreprocessedTokens to correctly<br>

>     handle the<br>

>     > first token on the line having leading space but being expanded from a<br>

>     > token in the first column.<br>

><br>

>     All right. Assuming that leading white space in column 1 is the only<br>

>     case that gets mishandled, it seems like a special exception would<br>

>     suffice. The attached (on top of my previous patch) passes testing. But<br>

>     I am not all that familiar with this part of the code.<br>

><br>

><br>

> Patch looks good, committed as r202070.<br>

<br>

</div></div>Thank you!<br>

<div class=""><br>

>     Note that the code below checks for # as a special exception: if that<br>

>     appears at the start of a line, a space is output without changing<br>

>     ColNo. While I do like consistency, using that approach would cause two<br>

>     spaces to be printed instead of one if # should appear in column 1, with<br>

>     leading white space.<br>

><br>

><br>

> It seems somewhat reasonable to insert the space and stay in column 1 in<br>

> that case -- that way, downstream tools that support the "preprocessed<br>

> source" input format can just strip the leading space in a line that<br>

> starts with " #" and not worry about later column numbers getting messed up.<br>

<br>

</div>Ah, I see how you read my previous message. Sorry for the confusion,<br>

that wasn't what I meant. What I meant was that the code directly below does<br>

<br>

if (ColNo <= 1 && Tok.is(tok::hash))<br>

  OS << ' ';<br>

<br>

and copying that approach, to get<br>

<br>

if (ColNo <= 1 && Tok.hasLeadingSpace())<br>

  OS << ' ';<br>

<br>

if (ColNo <= 1 && Tok.is(tok::hash))<br>

  OS << ' ';<br>

<br>

would have problems, so I just wanted to mention why I didn't do that.<br>

The ColNo variable is local to HandleFirstTokOnLine, never passed to any<br>

other function, and not visible, even indirectly, in either the output<br>

or in other functions.</blockquote><div><br></div><div> Ah, right. That means that:</div><div><br></div><div>#define HASH #</div><div>#define EMPTY</div><div>#define FOO EMPTY HASH</div><div>FOO</div><div><br></div><div>

... will get one space before the '#', not two. '-fpreprocessed' doesn't seem to care whether there's one space or two here, so I don't think this matters.</div></div></div></div>