[llvm-dev] Git Move: GitHub+modules proposal

Renato Golin via llvm-dev llvm-dev at lists.llvm.org
Sun Jun 26 14:39:26 PDT 2016


So,

It's been a while and the GitHub thread is officially dead, so I'll
propose a development methodology based on the feedback from that
thread. This is not *my* view, but all that was discussed in the
threads.

My objective is to form an official proposal to use Git as our main
repository, overcoming all the problems we currently have without
creating many others. In the end, I think Tanya wanted to make a vote,
and see how string the community feels about it. The vote should be
"Should we move to GitHub as the proposed workflow, or should we try
to find another solution away from our own hosting?".

The important part is that we *must* move from the current schema. It
does not scale and the administrative costs are not worth the trouble.
So, if we don't go with GitHub, we have to find another way out.

The proposal
==========

Move to GitHub with N+1 projects: all current LLVM projects + the
"llvm-projs" umbrella. The latter will have all other projects as
"submodules" with the intent to:

 1. Control the history via server hooks updating a unique and
auto-increment identifier, which will apply to every commit on its
submodules (ie, every other LLVM project).
 2. Serve as a reference for releases, buildbots, etc., checking out
only the necessary submodules for each.
 3. Have additional logic to handle the additional complexity for
mailing lists, tools, buildbots to deal with the umbrella project
*only*.

The existing LLVM projects (llvm, clang, compiler-rt, etc) will
continue on their own repositories and be built locally just like they
are today. You can check them out individually inside the final
directory (llvm/tools/clang) or use symbolic links, just like we do
today. You can also checkout "llvm-projs" and update only the required
submodules, and use symbolic links.

The llvm-projs umbrella will have its own versioning, and tools can
report that ID as their "version", if they're not in a release branch.

Release branches should be off of master and have a linear history,
just like master, in the exact same way we do now with SVN. This will
guarantee the umbrella project will be able to correctly
auto-increment the ID and make sure current tools work as usual.

We don't want private branches to end up in upstream LLVM (only
upstream release branches), but that's perfectly natural in GitHub,
where anyone can fork and implement their features and own releases
off of the upstream official repositories.

This can work as well for upstream development of "feature branches",
where upstream developers contribute to both repositories, but keeping
a specific feature in test separate. Merges will still have to be like
it is today, one patch at a time, or risk reverting the whole merge
window if the buildbots start breaking, which can be impossible if the
window is large or two or more windows get committed at the same time.

For "feature branches" we could use git-imerge, but that's for the
future and not considered in the first stage of the move.

Git Submodules
---------------------

There were concerns is submodules would work with our flow, but the
concerns were addressed by demonstrating that:

 1. Submodules can work in an umbrella project, which controls the
auto-increment ID
 2. You can check-out individual modules or all, so work well for
releases and buildbots
 3. The history is shared with the root project, so git-bisect works
out of the box

The Alternatives
---------------------

A few alternatives were proposed, but git submodules ended up being
considered more thoroughly. Here are some of the reasons:

 * Google repo:

It's an independent tool, which may suite us today, but not
necessarily tomorrow. It may work well with the infrastructure that is
already set for it on other projects (mostly Google projects like
Android), but it does require some tooling (like git submodules). The
point is, that it's much more likely to exist tooling for official git
features than third-party projects, especially on Windows.

 * All-in-one:

Proposals to have all projects inside one big repo were quickly
dismissed due to the problems it creates to *users* of LLVM as a
library, and to build systems (specific buildbots) that don't need to
monitor all changes all the time. It simply won't scale.

 * Multiple clones:

Allowing the projects to *exist* in different conditions (clang inside
LLVM, or as a stand-alone library) will not scale. CMake will have to
cope with all the different styles, it doesn't solve the unique
auto-increment ID nor it helps downstream projects migrate to a common
infrastructure.

Questions
========

In order to make this proposal final, I still need a few questions to
be answered.

1. How will the umbrella project's auto-increment hook work?

Will it be one ID for every commit in every other repo? How will it
know which one came first? Does it matter? If we have two commit "at
the same time", do we create a priority list?

Ex. LLVM commits get a lower ID than Clang ones, because it's more
likely that an LLVM commit needs to go in first.

2. How do we update the commits mailing lists?

Can we add a mail script to the auto-increment ID hook? Or should we
have a cron job that picks the new commits every 5 minutes in a server
somewhere and email them (in ID order) to the respective lists?

Approval
=======

Right now, we should not discuss if moving to Git or GitHub is a good
idea or not. This is about the proposal itself. So, if you don't want
Git or GitHub, wait for the voting to express that.

If you do want Git and GitHub, than please keep your comments on
topic, answer the questions and let's make sure we have a solid
proposal that most Git proponents are happy with.

If there is an alternative proposal (say, Google's repo), than this
has to be separate, and well explained and accepted to be voted, too.

Once we all agree in general, we should put it to vote. If there's an
overwhelming majority (not sure how to measure this), and no critical
problems (for example, learning new tools is less critical than
breaking all buildbots), we should go with the move.

For logistical reasons, if we do decide to move, I would like to do so
before 3.10 / 4.0 branches.

cheers,
--renato


More information about the llvm-dev mailing list