[PATCH] D28764: [clang-format] Implement comment reflowing (v3)

Tue Jan 24 08:14:59 PST 2017

klimek added a comment.

This is starting to be pretty awesome. The one thing that would help me review the gist of the implementation a bit more is if that had a bit more comments. Perhaps go over the math in the code and put some comments in why we're doing what at various steps.

================
Comment at: lib/Format/BreakableToken.h:42-48
+/// There is a pair of operations that are used to compress a long whitespace
+/// range with a single space if that will bring the line lenght under the
+/// column limit:
+/// - getLineLengthAfterCompress, for calculating the size in columns of the
+///   line after a whitespace range has been compressed, and
+/// - compressWhitespace, for executing the whitespace compression using a
+///   whitespace manager.
----------------
I assume the reason to not fold this into the next getSplitBefore is that sometimes we need to do the trimming even if we're the last line? Would it otherwise make sense?

================
Comment at: lib/Format/BreakableToken.h:93-96
+  /// \brief Returns the number of columns required to format the piece of line
+  /// at \p LineIndex, from byte offset \p TailOffset after the whitespace range
+  /// \p Split previously returned by \c getSplit has been compressed into a
+  /// single space.
----------------
I would cut "previously returned by getSplit", as I think this is irrelevant to this function.
Also, given that we hand in RemainingTokenColumn, why would LineIndex and TailOffset ever be relevant?
Lastly, this is not virtual, in which case it seems to not make sense to pass in any information that's not relevant for the current implementation?

================
Comment at: lib/Format/BreakableToken.h:119-122
+  virtual Split getSplitBefore(unsigned LineIndex,
+                               unsigned PreviousEndColumn,
+                               unsigned ColumnLimit,
+                               bool ReflowInProgress) const {
----------------
Instead of passing in ReflowInProgress, I'd not call it if !ReflowInProgress and make the comment more reflow-examplish.

It basically returns the split of the current line, if any parts of the current line fit the the line above it.

================
Comment at: lib/Format/BreakableToken.h:147-149
+  /// \brief Updates the next token of \p State to the next token after this
+  /// one. This makes sense when this token manages a set of underlying tokens
+  /// as a unit and is responsible for the formatting of the them.
----------------
s/makes sense when/can be used when/

================
Comment at: lib/Format/ContinuationIndenter.cpp:1158-1159
+        CommentPragmasRegex.match(Current.TokenText.substr(2)) ||
+        Current.TokenText.substr(2).ltrim().startswith("clang-format on") ||
+        Current.TokenText.substr(2).ltrim().startswith("clang-format off"))
       return addMultilineToken(Current, State);
----------------
krasimir wrote:
> klimek wrote:
> > krasimir wrote:
> > > klimek wrote:
> > > > Generally, we shouldn't need those here, as those should be part of the Token.Finalized state.
> > > Here the problem is that the comments /* clang-format on */ and /* clang-format off */ control the formatting of the tokens in between, but do not control their own formatting, that is they are not finalized themselves.
> > > What happens is that if we rely solely on Token.Finalized and the ColumnLimit is small, say 20, then:
> > > ```
> > > /* clang-format off */ 
> > > ```
> > > gets broken into:
> > > ```
> > > /* clang-format off
> > >   */
> > > ```
> > > 
> > > Add comments about this.
> > Isn't the right fix to change the tokens of the comment to be finalized?
> That's arguable. Consider [[ https://github.com/llvm-mirror/clang/blob/master/unittests/Format/FormatTest.cpp#L10916 | FormatTest.cpp:10916 ]]: in this case we still want to re-indent the /* clang-format on/off */ comments, just not break them. If we finalize the token, we can't re-indent.
I see two options:
- mark the comment with a special token type for clang-format on/off
- pull out a function - I see you already have one called mayReflowContent; can we use that?

================
Comment at: lib/Format/ContinuationIndenter.cpp:1205-1210
+    if (SplitBefore.first != StringRef::npos) {
+      TailOffset = SplitBefore.first + SplitBefore.second;
+      ReflowInProgress = true;
+    } else {
+      ReflowInProgress = false;
+    }
----------------
(optional) I'd probably write this:
  ReflowInProgress = SplitBefore.first != StringRef::npos;
  if (ReflowInProgress) {
    TailOffset = SplitBefore.first + SplitBefore.second;
  }

================
Comment at: lib/Format/ContinuationIndenter.cpp:1231-1233
+      unsigned RemainingTokenColumnsAfterCompress =
+          Token->getLineLengthAfterCompress(LineIndex, TailOffset,
+                                            RemainingTokenColumns, Split);
----------------
I think AfterCompression reads more naturally. Perhaps we should use "trim" instead of compress, now that I think about it.
So I'd probably go for getTrimmedLineLength or something.

================
Comment at: lib/Format/WhitespaceManager.cpp:131
+        Changes[i - 1].Kind == tok::comment &&
+        OriginalWhitespaceStart != PreviousOriginalWhitespaceEnd;
   }
----------------
Why was this needed? (what breaks without this change)

================
Comment at: unittests/Format/FormatTest.cpp:2169
+
+  // Don't reflow lines starting with two punctuation characters.
+  EXPECT_EQ("// long long long\n"
----------------
But we want to reflow lines starting with one punctuation character?

https://reviews.llvm.org/D28764