[llvm-dev] [RFC] Semi-Automatic clang-format of files with low frequency

Wed Jul 1 09:07:13 PDT 2020

   I am 100% in agreement that all clang-format patches should get actual code review. Clang-format falls over on lots of things, and I believe that judicious use of clang-format on/off pairs is the way to go to have a readable clang-format'ed codebase.

> I don't ever recall such a debate over formatting in a review thread (and while I don't any longer, for years, I read every review on the mailing list).

   Speaking from my experience, what usually happens is a newcomer touches a file, and a person who feels that they have code ownership of it tells them to make some formatting change. Generally, the newcomer has enough sense not to fight it and just makes the change. Very rarely does a full blown argument break out in code review, but even 1 round trip is unfortunate. Another bad thing that can happen is that a code reviewer *wants* to raise an issue in formatting, but they don't want to be confrontational so they let it go and/or fix it themselves later.

> Code is read more than it is written, and optimizing to make the code easier to read is very important.

   I agree. Personally, I find having the code be consistent is a very important factor in this equation. What looks nice is largely a matter of opinion, and with such a diverse set of people working on this codebase you get N different styles. Frankly, there are several things about the LLVM style that I hate (opening brace not on its own line, `auto *Foo` instead of `auto* Foo` or `auto * Foo` among others), that it's already hard for me to read. Part of being a professional is learning to deal with this sort of stuff, so clang-format making suboptimal formatting choices is just another thing. But at least it would be consistent, which is a huge boon to readability in my opinion.

   There are advantages to having complete clang-format coverage:

1) consistency!
2) developers can use git hooks to clang-format to their preferred style
3) nobody has to think about how to format

   There are advantages to not using clang-format as well. Mainly, the fact that you can have fine-grained control over the formatting. However, I don't believe that there are any real advantages to being where we are with partial clang-format coverage. It seems to me that the direction the project is going is to have full coverage. Why else would Phabricator complain when your code isn't formatted? Frankly, I think 1 year is easily enough time to get complete coverage without being overly disruptive to downstreams and developers with long-lived patches, so I think 1 year being the threshold for "this file is stable, let's format it" is unreasonably conservative.

Thanks,
   Christopher Tetreault

-----Original Message-----
From: Hal Finkel <hfinkel at anl.gov>
Sent: Tuesday, June 30, 2020 3:37 PM
To: Chris Tetreault <ctetreau at quicinc.com>
Subject: [EXT] Re: [llvm-dev] [RFC] Semi-Automatic clang-format of files with low frequency

On 6/30/20 5:03 PM, Chris Tetreault via llvm-dev wrote:
>> My comment there was more targeted against those who would take your idea and push it to unnecessary and counterproductive extremes.
> There's nothing counterproductive about just formatting everything and being done with it. The goal of doing clang-format-diff as we commit code is to minimize churn so that there isn't one big patch that conflicts with everything. But if you only format the lines that get touched by a commit, the work will never be done. At some point you need to finish the work, or the tool might as well be called clang-make-the-code-inconsistent.

I agree. But I think that we should view this as an iterative process.
We should generate patches for review, and realize that we might need to update clang-format in some cases instead of just always updating the code. clang-format has evolved over time, and I doubt that evolution has stopped now.

>
>> Since you're interested in thoughts about clang-format's
>> capabilities, I agree with Matt that the strictness of the approach
>> is a weakness that can frequently make code *less* readable. In
>> addition to what he mentions, here's an example of a bad change that
>> clang-format wants to make that I found after half a minute of
>> scanning our AMDGPU backend
>> code:
>>
>>    // We only support LOAD/STORE and vector manipulation ops for vectors
>>    // with > 4 elements.
>> -  for (MVT VT : { MVT::v8i32, MVT::v8f32, MVT::v16i32, MVT::v16f32,
>> -                  MVT::v2i64, MVT::v2f64, MVT::v4i16, MVT::v4f16,
>> -                  MVT::v4i64, MVT::v4f64, MVT::v8i64, MVT::v8f64,
>> -                  MVT::v16i64, MVT::v16f64, MVT::v32i32, MVT::v32f32 }) {
>> +  for (MVT VT :
>> +       {MVT::v8i32, MVT::v8f32, MVT::v16i32, MVT::v16f32, MVT::v2i64,
>> +        MVT::v2f64, MVT::v4i16, MVT::v4f16, MVT::v4i64, MVT::v4f64, MVT::v8i64,
>> +        MVT::v8f64, MVT::v16i64, MVT::v16f64, MVT::v32i32,
>> + MVT::v32f32}) {
>>      for (unsigned Op = 0; Op < ISD::BUILTIN_OP_END; ++Op) {
>>        switch (Op) {
>>
>> I don't particularly mind that clang-format puts that initializer
>> list onto a new line, or that it changes the whitespace around
>> braces. What I do mind: the original code lays the initializer list
>> out carefully such that integer and floating point types always come
>> in pairs on the same line: v8[if]32 and v16[if]32 on the first line,
>> v2[if]64 and
>> v4[if]64 on the second line, and so on. clang-format mindlessly mushes the initializer list together in a way that makes it harder to see at a glance what is going on.
>>
>> (Note that this isn't code that I wrote, so I'm not emotionally
>> attached to it or anything. But I've run into this kind of problem
>> many times during development.)
>>
>> I believe the common theme here is that clang-format ought to have a mode in which it is more accepting of different layouts of lists co-existing within the same code base. If that feature was available, I would be a strong proponent for adopting it in LLVM itself.
>>
>> Cheers,
>> Nicolai
> The problem is that while you may find the way it's formatted to be easier to read, not everybody does. If I were formatting it manually, I probably would have had 8 lines of pairs, with one pair per line, or I would have had all the integer types then all the float types if the order is not important. Now we have a disagreement, and a needless code review issue in the future when somebody adds a new MVT that needs to be handled and fails to figure out the pattern. That's the point of the tool. It can't be reasoned with. It can't be bargained with. It doesn't feel pity of remorse or fear. And it absolutely will not stop. Ever. Until your code is formatted correctly.

Okay, wait. Let's not solve problems that we don't have. I don't ever recall such a debate over formatting in a review thread (and while I don't any longer, for years, I read every review on the mailing list). I agree that your formatting suggestion is better than the current one, but if both the current formatting and your suggested improvement are better than what clang-format provides, so using the clang-format version seems like a regression. We should improve things so that using clang-format would be an improvement, or we should figure out how to exempt this part of the code.

>
> Arguing about code format in code review is a productivity drain. Time spent figuring out the style of this one random file is a productivity drain. Time spent inserting a bunch of spaces in a 43 line switch because you added a new case that is just 1 character too long is a productivity drain. We have a tool that can do this for us. Maybe it doesn't format everything exactly how we'd like, but at least you won't have to justify it in code review. Sure, maybe it will miss things, but at least you don't have to push a new patch because a line is too long or you put the * next to the type rather than the variable name. The tool will evolve and improve. Frankly, I'm surprised that it doesn't already perfectly implement the LLVM coding style, but surely this is the goal.

No one I know likes spending their time formatting code. We do it when we feel it is important. Code is read more than it is written, and optimizing to make the code easier to read is very important. Formatting lists of things, etc. nicely has helped me catch bugs in the past, and even if the price for that is having to comment in a review thread about the formatting, or spend some time formatting, I'll take it. When things are nicely arranged, it's sometimes more obvious when something just isn't right. Tracking down bugs in compilers takes far more time than all of that.

Thanks again,

Hal

>
>  From your previous email:
>
>> Hard no on the shorter period. I have changes I'm working on for several months at a time. 1 year is a minimum for this in my opinion.
> Your patch should already be formatted using clang-format-diff, right? Phabricator is configured to complain on your commit if this is not the case, so I don't see how a clang-format commit could possibly cause a conflict for you.
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

--
Hal Finkel
Lead, Compiler Technology and Programming Languages Leadership Computing Facility Argonne National Laboratory