[PATCH] D92257: [clang-format] Add option to control the space at the front of a line comment

Mon Dec 7 12:39:36 PST 2020

HazardyKnusperkeks marked 6 inline comments as done.
HazardyKnusperkeks added a comment.

In D92257#2435906 <https://reviews.llvm.org/D92257#2435906>, @lebedev.ri wrote:

> In D92257#2435902 <https://reviews.llvm.org/D92257#2435902>, @MyDeveloperDay wrote:
>
>> In D92257#2435899 <https://reviews.llvm.org/D92257#2435899>, @HazardyKnusperkeks wrote:
>>
>>> In D92257#2435701 <https://reviews.llvm.org/D92257#2435701>, @MyDeveloperDay wrote:
>>>
>>>> Can I assume you need someone to land this for you?
>>>
>>> Yes I do. But I have a question, my last change is commited in your name, that means git blame would blame it on you, right?
>>>
>>> You can set me as author:
>>> `Björn Schäpers <bjoern at hazardy.de>`
>>> My Github Account is also called `HazardyKnusperkeks`.
>>
>> The process is that you add (https://llvm.org/docs/Contributing.html)
>>
>> Patch By: HazardyKnusperkeks
>>
>> to the commit message if the user doesn't have commit access, if you want your name against the blame then I recommend applying for commit access yourself.
>
> That is incorrect and does not represent the nowadays reality, i suggest that you look up the docs.
>
>> let me know if you still want me to land this

What/Where are the docs? I read https://llvm.org/docs/Contributing.html before hand and just now https://llvm.org/docs/CodeReview.html

---

I will update the patch, but that won't happen before the weekend.

================
Comment at: clang/docs/ClangFormatStyleOptions.rst:2727
+
+  * ``unsigned Maximum`` The maximum number of spaces at the start of the comment.
+
----------------
krasimir wrote:
> HazardyKnusperkeks wrote:
> > MyDeveloperDay wrote:
> > > I'm personally not a massive fan of stuffing -1 into an unsigned but I understand why its there, if this was an signed and it was actually -1 would the algorithm be substantially worse in your view?
> > I'm no fan if unsigned in any way, but that seems to be the way in clang-format.
> > Making it signed would require a few more checks when to use it, but I don't see any problem in that.
> > I just would also make the Minimum signed then just to be consistent.
> > 
> > While parsing the style I would add checks to ensure Minimum is never negative and Maximum is always greater or equal to -1, should that print any warnings? Is there a standard way of doing so? Or should it be just silently corrected?
> I find it confusing why we have 2, Minimum and Maximum, instead of a single one.
> I'm not convinced that `Maximum` is useful.
> Conceptually I'd prefer a single integer option, say `LineCommentContentIndent` that would indicate the default indent used for the content of line comments. I'd naively expect `LineCommentContentIndent = `:
> * 0 would produce `//comment`
> * 1 would produce `// comment` (current default)
> * 2 would produce `//  comment`, etc.
> and this will work with if the input is any of `//comment`, `// comment`, or `//  comment`, etc.
> 
> An additional consideration is that line comment sections often contain additional indentation, e.g. when there is a bullet list, paragraphs, etc. and so we can't guarantee that the indent of each line comment will be less than Maximum in general. I'd expect this feature to not adjust extra indent in comments, e.g.,
> ```
> // Lorem ipsum dolor sit amet,
> //  consectetur adipiscing elit,
> //  ...
> ```
> after reformatting with `LineCommentContentIndent=0` to produce
> ```
> //Lorem ipsum dolor sit amet,
> // consectetur adipiscing elit,
> // ...
> ```
> (and vice-versa, after reformatting with `LineCommentContentIndent=1`).
> This may well be handled by code, I just wasn't sure by looking at the code and test examples.
I was actually going for only one value, but while writing the tests I came to the conclusion that before my change is only enforced a minimum of 1. But that very well may be because of what you call line comment sections, I did not consider that. That's why I chose a minimum and maximum.

I will modify the patch to one value and will also add tests for the sections. But for that I need to remember if I added or removed spaces, right? Is there already infrastructure for that? Or is there any documentation of the various steps clang-format takes to parse and format code? Until now I tried to understand what's going on through single stepping with the debugger (quite time consuming).

================
Comment at: clang/lib/Format/BreakableToken.cpp:790
+          (Style.Language != FormatStyle::LK_TextProto ||
+           OriginalPrefix[i].substr(0, 2) != "##")) {
+        Prefix[i] = IndentPrefix.str();
----------------
krasimir wrote:
> HazardyKnusperkeks wrote:
> > HazardyKnusperkeks wrote:
> > > MyDeveloperDay wrote:
> > > > is this case covered by a unit test at all? sorry can you explain why you are looking for "##"?
> > > It is covered by multiple tests, that's how I was made aware of it. :)
> > > If you look at the code before it only adds a space if the old prefix is "#" not "##" which is also found by `getLineCommentIndentPrefix`. As it seems in `TextProto` "##" should not be touched. I can of course add a test in my test function.
> > > 
> > > Now I see a change, in the code before "#" was only accepted when the language is `TextProto`, now it is always. But I think for that to happen the parser (or lexer?) should have assigned something starting with"#" as comment, right? But I can change that.
> > Okay # # is formatted, I try again:
> > If you look at the code before it only adds a space if the old prefix is "#" not "`##`" which is also found by `getLineCommentIndentPrefix`. As it seems in `TextProto` "`##`" should not be touched.
> Thanks for the analysis!
> I wrote the text proto comment detection. I believe the current clang-format is buggy in that it should transform `##comment` into `## comment` for text proto (and similarly for all other `KnownTextProtoPrefixes` in `getLineCommentIndentCommentPrefix`), so this `substr(0, 2) != "##"` is unnecessary and I should go ahead and update and add tests for that.
In that case I will remove that check, but as said there were many tests which failed without it, I will have to adapt them too.

================
Comment at: clang/unittests/Format/FormatTestComments.cpp:3405
+            "//  Lorem   ipsum\n"
+            "//  dolor   sit amet\n" // Why are here the spaces dropped?
+            "\n"
----------------
krasimir wrote:
> This is desired, AFAIK, and due to the normalization behavior while reflowing: when a comment line exceeds the comment limit and is broken up into a new line, the full range of blanks is replaced with a newline. (https://github.com/llvm/llvm-project/blob/ddb002d7c74c038b64dd9d3c3e4a4b58795cf1a6/clang/lib/Format/BreakableToken.cpp#L66).
> Note that reflowing copies the extra indent of the line, e.g.,
> ```
> // line limit  V
> // heading
> // *line is
> //   long long long long 
> ```
> get reformatted as
> ```
> // line limit  V
> // heading
> // *line is
> //   long long
> //   long long 
> ```
> so if for ranges of blanks longer of size S>1 we copied the (S-1) blanks at the beginning of the next line, we would have cascading comment reflows undesired with longer and longer indents.
> 
Okay, I mean the spaced between `sit` and `amet`, while the spaces between `Lorem` and `ipsum`, and `dolor` and `sit` is kept.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D92257/new/

https://reviews.llvm.org/D92257