[LLVMdev] git Status Update?

Fri Sep 9 06:12:49 PDT 2011

On Fri, Sep 09, 2011 at 08:05:38AM -0400, Justin Holewinski wrote:
> To add to the social aspect, I can say from personal experience (as a
> student/enthusiast developer) that contributing to open source projects that
> use a decentralized SCM model (e.g. git) is easier than a centralized (e.g.
> subversion) model.  I can create a clone on github (or local, or wherever),
> commit all that I want, then send pull requests or patches to the upstream
> maintainers.  With a subversion repository, I have to create a separate
> local checkout for each "feature," and code sharing between them is very
> difficult, not to mention if I want to transfer some changes from my laptop
> to my desktop, or vice versa.  Whether this matters for LLVM or not, I do
> not know.  The subversion model may make more sense for the more "corporate"
> users where a completely linearized history may be preferable.  But my point
> is that the barrier for entry is lower when using a decentralized repository
> model for open-source developers that are familiar with both SCM models.

I guess this is the crux of the matter and the "Git is better than SVN"
significantly depends on the perceived advantage here. I have
contributed patches for literally dozen of open source projects and have
commit access for a larger number, too. I have been dealing with most of
the version control systems on the market as well as unversioned
projects.

Personally, I don't see "social coding" like Github as improving
contributions in general. Let me to explain why. There are different
level of contributors and they have different needs.

The first category is the casual one-off patch. For me this is often the
case when packaging software up for NetBSD. Do I consider a patch a local
hack? Do I consider it important for upstream? The important aspect here
is that the change typically tends to be small and that I want to have a
minimal amount of fuzz getting rid of it. I do have a copy of the source
code (as I am patching it), but normally it is a release tarball.
Projects requiring patch submission via Git increase the barrier of
entry (and willingness) a lot by requiring me to clone the repo, apply
the patch and figure out where to email it or how to submit a pull
request etc. Sorry, too much hassle. Give me the email address in first
place and massage the patch to fit into style rules etc -- I don't care
enough to figure out all the details all the time, especially since they
almost always differ between projects. Patch maintainance is not a big
issue either in this case. Releases are hopefully rare enough and small
changes typically apply or have been replied. From the LLVM perspective,
it might help to split patch submission from the commit lists to reduce
the chance of patches getting forgotten, but dealing with LLVM is
relatively easy so far. I don't expect git to change this at all.
Dealing with this kind of contributors (from the project side) requires
a similar attitude as dealing with PRs. Don't expect the submittor to
rework patches a lot or to nag you about forgotten ones. Git may help a
bit with the second part, but not more than moving patches to a separate
(lower traffic) mailing list would.

The second category contains projects I am interested in and where I do
follow development as well as doing regular contributions. This involves
being part of the community to some degree, e.g. by subscribing mailing
lists. Here I am more likely to develop bigger patches and be aware of
the coding style etc. I may or may not base my development on release,
depending on what the release cycle is. I strongly tend to avoid having
long term development of complex functionality in this case, simply
because it makes it a lot harder to get the patches accepted upstream
after creating the work. My experience here is that different from what
David reports in this thread -- Git does *not* make this simpler. In a
way, it makes it more difficult as it requires more self control in
keeping patch sets manageable. My experience is also that breaking up a
complex experimental patchset for submission tends to result in enough
code reorganisation, that it is half of a rewrite anyway. From the
perspective of a project, there is nothing worse than getting a
submission with a 100KB or larger patch (not counting mechanical
changes). Noone wants to review such a beast. There is almost always
some form of objection, so it just creates a lot of work either way to
not discuss both the goals and the implementation outline early enough.

The third category contains projects where I am part of upstream with
commit access. It doesn't necessarily mean that the changes I am working
on are larger in scope than for the 2nd category. Typically the
punishment for submitting enough patches is getting commit access after
all. The important difference is that it is possible to work on larger
changes in public and with the awareness of the rest of the community.
If you are going to change a core API, you don't want anyone else to
change the same interface while you are at it. This is not a question of
distributed vs centralised development, but of communication. Surely git
makes it faster to create a branch in a larger repository. It also often
requires a bit less work for merging changes, but the main difference is
performance here. If you have a large private branch and want to merge
it back, you should pay the price for it. Seriously, it is a bad
development style. Being able to do some small adjusts in early commits
to be able to fast-forward the commits to create mail bombs does not
improve the situation for the actual reviewers. I consider the often
appearing "Patch 1/65" threads creating by git on various projects rude.
Switching to pull requests on github doesn't change the problem, just
the media. If Github's UI is better for that purpose is a completely
separate question.

The barrier of entry for the LLVM.org repositories is low. Getting
review is normally easy. If it isn't, Git isn't going to improve the
situation. If a switch to git is desirable to You (not meaning Justin),
keep in mind that what is easy for one person can create a burden on
someone else. The LLVM development style exists for a reason and matches
what many other, often much older and just as large, projects do.

Joerg