[llvm-dev] Widescale clang-tidy (or similar) based cleanup

Mon Apr 10 09:47:32 PDT 2017

On Mon, Apr 10, 2017 at 9:14 AM Zachary Turner <zturner at google.com> wrote:

> +100 to Mehdi.  Large scale cleanups should not only be welcome, they
> should be encouraged.  This is the type of work that almost nobody wants to
> do and is sorely underappreciated (as evidenced by the fact that this
> thread even exists, IMHO).
>

Bit harsh ;) I do a fair bit of cleanup myself and really do appreciate it
- I'd love to see the whole codebase cleaned up of range-for and other
things like this that have been implemented in clang-tidy, really!

>   Code quality and code health are ongoing costs, and if we raise the
> barrier to entry for this type of change, then they're not going to happen.
>
> Why should we be requiring pre-commit review for cleanup changes?  That
> doesn't make any sense.  Reviews are for when you want to verify
> correctness in an area that aren't too familiar with, or you are
> substantially refactoring something and/or changing the design.  Pre-commit
> reviews for changing for loops to be ranged based?  That's a waste of
> everyone's time, it's already hard enough to get timely reviews even for
> important things.
>

Review time, imho/experience, is generally proportional to code complexity.
I'm not suggesting/very interested in reviewing the actual code changes of
these cleanups - and several have been sent for review which is where I
initially brought up this discussion.

That's why I'm interested in these things being a -dev level discussion for
each major change to sign off on the approach/idea (the same way Google's
internal "large scale change" process works, FWIW) without the need for
individual owners, etc, to discuss whether a certain change is relevant.

As for the reason(s)/risks: I think like any change there's a cost/benefit
tradeoff, in the case of these sort of changes - even beyond the
blame/version history cost that seems to be enough for some people to avoid
whitespace cleanup - these sort of cleanups are potentially semantics
changing.

So an email describing what sort of changes will be made (highlighting the
corner cases, assumptions, etc) seems like it'd be useful to know whether
the project should take the risks when changing that much code.

(examples I'm thinking of would be indirect iterator invalidation in a for
loop so someone has carefully re-evaluated end() on each iteration. Does
the auto-range-for-ification touch these loops? If so it could introduce
bugs. I don't want to go looking through every line of code change to see
if it did make such a change)

When an LLVM developer does some cleanup - yeah, they might make a mistake
an introduce bugs like this as well, but if it's human written then I'll
expect human written errors and probably review the changes to some degree.

Once it's automated - I think it's not a bad idea to review the automation
before applying it.

>
> On Fri, Apr 7, 2017 at 9:20 PM Mehdi Amini via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
> Hi,
>
> TLDR: I’m fairly adamantly opposed to most of what is proposed/discussed
> here.
>
> Long version:
>
>
>
> 1) Large scale formatting is not welcome.
>
> This is a valid point, I totally agree with it and I’m not asking to
> change this: we’re not welcoming applying clang-format or other formatting
> only changes to an entire file, at least for no other reason than “just
> doing it”.
> The reason, I believe, is that there is little perceived value to
> formatting-only change by themselves, and thus such changes can’t by
> themselves offset the little cost they incur (blame is a little bit less
> straightforward, conflict with existing patches, etc.). Note that the cost
> here is incurred by the fact that global formatting changes are usually
> associated with “a very large part of the file is affected”.
>
> Reformatting a file as a whole is only accepted (AFAIK) as a pre-step for
> another commit that touches “most” of the file anyway. As Reid describes
> it: "In effect you're taking some ownership for it” and “it also usually
> reduces the patch size of the eventual functional change, which makes it
> easier to understand the patch during code review”.
>
>
>
> 2) Refactoring and other cleanup *are* welcome.
>
> I believe that it is one core strength of the LLVM project: we’re
> aggressively refactoring and cleaning up the codebase as we go. We already
> have some "good practices” and/or coding guidelines [0] [1]. We’re not
> enforcing these as strict rules, but I have never seen any push-back on NFC
> commits that are pure cleanup.
>
> For example, the include style from our coding standard [2] aren’t well
> enforced, yet many people in the community felt that it is worthwhile
> fixing when they encountered such cases (e.g [3][4][5][6]).
> Another one is to prefer for-range loop when possible, Sanjay Patel commit
> a bunch of cleanup on this aspect (e.g [7][8][9][10]), and I already
> praised this at the time it happened [11] !!
>
> I have never seen any push back on such commits and I don’t want to see
> this changing at all.
>
>
>
> 3) "if people want to make global pattern-based cleanups (push_back ->
> emplace_back, range-based-for, etc), we should just raise it on llvm-dev
> and make a decision about the value of the cleanup.”
>
> This seems to me to be in line with existing practices.
>
> However I’d like to be very careful about this: there is already a lot of
> things that are accepted (e.g. include only header you’re using, sort
> headers, etc.). There is no reason to block such change on a llvm-dev
> discussion. These have either already been discussed, considered, and
> sometimes even specified in the coding standard ; or are widely accepted
> practices (according to the commit history).
>
>
>
> 4) Fallacy: these commits were targeted while tool-based large-scale
> changes are not.
>
> Just because I’ll use clang-tidy to reorder the header instead of doing it
> manually does not change the nature of the patch neither its practical
> consequence, so I don’t see any rational to differentiate them.
> I believe the same reasoning applies if I update a single file in my
> commit or 100 of them. If this is really a problem to people, we can have
> 100 single file commits, with the *exact same result* (I’m not in favor of
> this, and I’d have to read some reasoning about doing it this way).
>
>
>
> 5) Fallacy: "Usually some amount of cleanup has been acceptable if the
> code was generally being churned anyway”
>
> While this sentence is strictly correct, inferring that cleanup is *only*
> acceptable if the code was being churned anyway is totally false IMO, and
> the history of the repository proves it largely.
> I cite some commits below, but there are hundreds others of them!
>
>
>
> 6) We should have pre-commit reviews for cleanups.
>
> What makes such changes so critical that they require a pre-commit review
> while we don’t have a pre-commit review policy for bug fixes or other
> commits? I don’t get it. Really.
>
> We trust committers to use their best judgement to decide if they need or
> not pre-commit review or if post-commit is good enough.
>
> I personally don’t feel I need to get through pre-commit review for
> reordering the headers in *any* part of the codebase today, and I’m opposed
> to change this (and I’m saying this being part of the crowd the most in
> favor of pre-commit review in general).
>
> We already suffer from lack of reviewer bandwidth for meaningful changes,
> we don’t need to waste time on such trivial changes. If you feel like you
> want to post-commit review these, good for you, but there is no
> justification to add overhead on the people who try to make things better
> here.
>
>
> —
> Mehdi
>
>
>
>
> [0] http://llvm.org/docs/CodingStandards.html
> [1] http://llvm.org/docs/ProgrammersManual.html
> [2] http://llvm.org/docs/CodingStandards.html#include-style
> [3] r253915
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/314999.html
> [4] r254005
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20151123/315227.html
> [5] r225439
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20150105/251346.html
> [6] r222151
> http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20141117/118559.html
>
> [7] r256891
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160104/322749.html
>
> [8] r256998
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160104/323008.html
> [9] r257500
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160111/324186.html
> [10] r257845
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160111/325042.html
> [11]
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160111/325043.html
>
>
> —
> Mehdi
>
>
>
>
>
> On Apr 7, 2017, at 3:49 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>
>
> On Fri, Apr 7, 2017 at 3:14 PM Adrian Prantl <aprantl at apple.com> wrote:
>
>
> > On Apr 7, 2017, at 3:05 PM, David Blaikie via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
> >
> > There have been some efforts recently to use clang-tidy or similar
> automated refactoring to make project-wide cleanups/improvements to the
> LLVM codebase that appear to me to be a bit out of character with the sort
> of changes & philosophies that have been applied in the past.
> >
> > I think there are a few issues at hand which I'll try to summarize:
> >
> > * Churn/blame-convenience:
> > Previously, large scale changes (classically whitespace cleanup, with
> things like clang-format) have been rejected or discouraged due to the risk
> of making it harder to find the original source of a semantic change.
> > I'm not too wedded to this issue myself, but it did seem to be the
> status-quo previously so I'm curious to better understand if/how people see
> this applying or not, to more semantic changes.
>
> Personally I think that this is a shortcoming of one's tooling if this is
> a problem. Many git-blame viewers allow you to roll back to the previous
> revision at the current line at one keypress.
>
> >
> > * Efficiency:
> > Sending individual or even batched reviews for automated changes like
> this seems inefficient. I'd rather see, for example, an email to llvm-dev
> to discuss whether a particular change makes sense to make across the
> project (ie: does the LLVM project want to cleanup all instances of classic
> for to range-based-for when clang-tidy can do so?) and then not send the
> individual changes for review, but commit them directly where possible. (if
> the person doing this automated cleanup doesn't have permission, yeah, send
> it out and reference the original discussion thread).
>
> I think it makes sense to send out a phabricator review per kind of
> clang-tidy change, i.e., one patch that performs the same kind of change
> all over the project. Having only one kind of transformation per review
> will make it really easy to skim over the patch and detect situations where
> clang-tidy changes the code for the worse.
>
>
> Reckon it's worth elevating these reviews to llvm-dev rather than only
> llvm-commits? (in effect what I'd expect to see is a "hey, thinking about
> applying this pattern-based* cleanup - here are some examples of what it
> does, the corner cases, cases it doesn't handle, etc (and, as an aside, the
> attached patch shows the changes created by applying this tool to all of
> LLVM (or LLVM+Clang, etc))")
>
> Anyone have opinions on whether llvm-dev would be a sufficient medium to
> approve automated cleanup of other subprojects? (Clang? compiler-rt? lld?
> lldb?). Or would it be necessary to have a discussion aobut each
> subproject? (roughly I feel like LLVM+Clang are close enough together that
> a single approval would suffice there at least - but maybe Clang devs**
> have other ideas... )
>
> *thanks, Reid, for that choice of terminology
>
> ** Pure Clang devs notably not included in this email thread...
>
>
>
> -- adrian
> >
> > Some points of comparison: sometimes there's similar cleanup done in a
> non-automated fashion as Mehdi pointed out in one of the threads I brought
> this up (see this thread:
> http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20170403/443547.html
>  ). Usually some amount of cleanup has been acceptable if the code was
> generally being churned anyway (eg: clang-formatting a file so it's
> consistent, before doing major surgery to it anyway, so the surgical
> changes don't create formatting inconsistencies), or as a result of a new
> API change (add a range-based accessor then fix up existing call sites to
> use range-based-for). I'd also not be surprised by a whitespace cleanup
> shortly after new code was committed - and have done this myself "oops,
> forgot to format my commit, here's a new commit that does the formatting".
> That seems different to me from these wider-scale cleanups.
> >
> > I think I'm personally mostly in favor of this sort of stuff (though I
> think I would like some community buy-in to sign off project-wide on each
> clang-tidy rule/pattern/etc that's going to be applied) but it does seem
> new/different from the way some things have been done in the past so I'm
> curious how other people think about the differences/similarities/guiding
> principles.
> >
> > - David
> > _______________________________________________
> > LLVM Developers mailing list
> > llvm-dev at lists.llvm.org
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20170410/1d751753/attachment.html>