[PATCH] D28764: [clang-format] Implement comment reflowing (v3)

Wed Jan 25 03:25:23 PST 2017

krasimir added inline comments.

================
Comment at: lib/Format/ContinuationIndenter.cpp:1158-1159
+        CommentPragmasRegex.match(Current.TokenText.substr(2)) ||
+        Current.TokenText.substr(2).ltrim().startswith("clang-format on") ||
+        Current.TokenText.substr(2).ltrim().startswith("clang-format off"))
       return addMultilineToken(Current, State);
----------------
klimek wrote:
> krasimir wrote:
> > klimek wrote:
> > > krasimir wrote:
> > > > klimek wrote:
> > > > > Generally, we shouldn't need those here, as those should be part of the Token.Finalized state.
> > > > Here the problem is that the comments /* clang-format on */ and /* clang-format off */ control the formatting of the tokens in between, but do not control their own formatting, that is they are not finalized themselves.
> > > > What happens is that if we rely solely on Token.Finalized and the ColumnLimit is small, say 20, then:
> > > > ```
> > > > /* clang-format off */ 
> > > > ```
> > > > gets broken into:
> > > > ```
> > > > /* clang-format off
> > > >   */
> > > > ```
> > > > 
> > > > Add comments about this.
> > > Isn't the right fix to change the tokens of the comment to be finalized?
> > That's arguable. Consider [[ https://github.com/llvm-mirror/clang/blob/master/unittests/Format/FormatTest.cpp#L10916 | FormatTest.cpp:10916 ]]: in this case we still want to re-indent the /* clang-format on/off */ comments, just not break them. If we finalize the token, we can't re-indent.
> I see two options:
> - mark the comment with a special token type for clang-format on/off
> - pull out a function - I see you already have one called mayReflowContent; can we use that?
mayReflowContent is too specific for content during region. I'll add a delimitsRegion method.

================
Comment at: lib/Format/ContinuationIndenter.cpp:1205-1210
+    if (SplitBefore.first != StringRef::npos) {
+      TailOffset = SplitBefore.first + SplitBefore.second;
+      ReflowInProgress = true;
+    } else {
+      ReflowInProgress = false;
+    }
----------------
klimek wrote:
> (optional) I'd probably write this:
>   ReflowInProgress = SplitBefore.first != StringRef::npos;
>   if (ReflowInProgress) {
>     TailOffset = SplitBefore.first + SplitBefore.second;
>   }
How about this?

================
Comment at: lib/Format/ContinuationIndenter.cpp:1231-1233
+      unsigned RemainingTokenColumnsAfterCompress =
+          Token->getLineLengthAfterCompress(LineIndex, TailOffset,
+                                            RemainingTokenColumns, Split);
----------------
klimek wrote:
> I think AfterCompression reads more naturally. Perhaps we should use "trim" instead of compress, now that I think about it.
> So I'd probably go for getTrimmedLineLength or something.
I don't agree with "trimmig" because i have a connotation of "trimming" around the ends of strings like in ltrim/rtrim and this fundamentally operates somewhere inside. +1 for AfterCompression, though.

================
Comment at: lib/Format/WhitespaceManager.cpp:131
+        Changes[i - 1].Kind == tok::comment &&
+        OriginalWhitespaceStart != PreviousOriginalWhitespaceEnd;
   }
----------------
klimek wrote:
> Why was this needed? (what breaks without this change)
This is a dirty hack. The problem is that WhitespaceManager does its own formatting for trailing comments and sometimes it re-formats a piece of line even after it's been reflown with the previous line. Consider if we reflow the second line up:
```
// line 1
// line 2
```
That amounts to 2 changes, first (delimited by ()) for the whitespace between the two tokens, and second (delimited by []) for the beginning of the next line:
```
// line 1(
)[// ]line 2
``` 
So in the end we have two changes like this:
```
// line 1()[ ]line 2
```
Note that the OriginalWhitespaceStart of the second change is the same as the PreviousOriginalWhitespaceEnd of the first change.
In this case, the above code marks the second line as trailing comment and afterwards applies its own trailing-comment-alignment logic to it, which might introduce extra whitespace before "line 2".

For a proper solution we need a mechanism to say that a particular change emitted through the whitespace manager breaks the sequence of trailing comments.

================
Comment at: unittests/Format/FormatTest.cpp:2169
+
+  // Don't reflow lines starting with two punctuation characters.
+  EXPECT_EQ("// long long long\n"
----------------
klimek wrote:
> But we want to reflow lines starting with one punctuation character?
I guess I'm looking at lines that start like this: "ala", 'ala'. So it may be better to special-case these two instead.
The heuristic for now, in mayReflowContent is: the content of the line needs to have at least two characters and either the first or the second character must be non-punctuation.

Another approach would be to not do these funny things and to see how users will complain about mis-re-flows and implement blacklist/whitelist content regex prefixes in the future.

Add a test showing the current positive behavior.

https://reviews.llvm.org/D28764