[PATCH] Don't remove unknown tokens that are not whitespace (PR17215).

Wed Oct 9 09:48:56 PDT 2013


================
Comment at: lib/Format/TokenAnnotator.cpp:573
@@ -571,1 +572,3 @@
   void determineTokenType(FormatToken &Current) {
+    if (Current.Previous && Current.Previous->Tok.is(tok::unknown))
+      Current.Type = TT_ImplicitStringLiteral;
----------------
Daniel Jasper wrote:
> Alexander Kornienko wrote:
> > Daniel Jasper wrote:
> > > Can this actually make a difference? If yes, how? It seems like we convert tok::unknown to tok::string_literal or set Type = TT_ImplicitStringLiteral earlier or merge this into another token's whitespace..
> > I think, we want to preserve whitespace around non-whitespace unknown tokens. The most straightforward way to do this seems to be setting the type to TT_ImplicitStringLiteral for the token itself and the next one. The former is done in Format.cpp, and the latter - here.
> > 
> > And, btw, we don't convert tok::unknown to tok::string_literal in this specific case. I'm not sure if we should.
> Why would we want to preserve that whitespace? I'd just add the whitespace as leading whitespace to the TT_ImplicitStringLiteral.
> Why would we want to preserve that whitespace?

Because it's hard to figure out the meaning of an unknown token. The example from http://llvm.org/PR17215 is:

  #define VER_QUALIFIER \x20(dbg)

Here we shouldn't insert a space between the backslash and "x20". And if a space was there, we shouldn't remove it.

> I'd just add the whitespace as leading whitespace to the TT_ImplicitStringLiteral.

I'm not sure I understand what you mean here. I think, we need to avoid touching both leading and trailing whitespace of unknown tokens. And as we always attach whitespace to the following token, we need to deal with the next token as well. Or you see an easier way? Then could you explain it a bit more thoroughly?

================
Comment at: lib/Format/TokenAnnotator.cpp:575
@@ +574,3 @@
+      Current.Type = TT_ImplicitStringLiteral;
+    if (Current.Type != TT_Unknown)
+      return;
----------------
Daniel Jasper wrote:
> Alexander Kornienko wrote:
> > Daniel Jasper wrote:
> > > This change is not a no-op!! Move to line 608 and add:
> > > 
> > > // FIXME: Add tests that break if this gets moved up.
> > > 
> > > if no tests are failing now...
> > It wasn't meant to be a no-op. This check is here to preserve Current.Type if it is already set to TT_ImplicitStringLiteral. If moving the check here can really break something, we certainly need tests for this. I've added the FIXME line and changed the check to a more specific one.
> I think you are actually not achieving what you want to achieve. The code between here and line 614 does not even touch Current's Type. It only changes the type of previous tokens (which might still be bad). However, this check in its current form should not change the behavior. If it does, what is the test case?
I don't completely understand, why the "if (Current.Type == TT_Unknown)" check was where it was before, but it's back there now after your comment (and I've added the FIXME note you asked me to add), so the semantic of this part should be unchanged. What has changed, is the new code that handles the case of tok::unknown in the beginning of the function. As this case is quite distinct, I don't think it affects anything else. The test for this code is in place, and the code really does what it's meant to do (as in "removing the code breaks the test").

And I can change the check from the "early exit" style back to what it was before, if you prefer it to be that way, btw.


http://llvm-reviews.chandlerc.com/D1858