[llvm-dev] [RFC] One or many git repositories?

Hal Finkel via llvm-dev llvm-dev at lists.llvm.org
Fri Jul 22 13:33:42 PDT 2016


----- Original Message -----

> From: "Piotr Padlewski via llvm-dev" <llvm-dev at lists.llvm.org>
> To: "Richard Smith" <richard at metafoo.co.uk>
> Cc: "llvm-dev" <llvm-dev at lists.llvm.org>
> Sent: Friday, July 22, 2016 3:18:31 PM
> Subject: Re: [llvm-dev] [RFC] One or many git repositories?

> I have one reasone why we should not moe to monolithic repository -
> If you do some light stuff like clang-tidy, that don't often require
> syncing with clang, but you still want to have the most recent
> checks, then I don't see a solution in monolithic repository.
> And this is a real issue if you only have 2 or 4 core laptop to do
> work.
> And I guess the the build system won't solve the problem, just a
> small change in some llvm file will result in recompiling many files
> that clang-tidy depends on.
This seems like an orthogonal problem. It would also be nice to have a build-system mode which decouples Clang from LLVM, in terms of dependency checking, for the same reason. 

-Hal 

> 2016-07-22 13:08 GMT-07:00 Richard Smith via llvm-dev <
> llvm-dev at lists.llvm.org > :

> > Having read through the entire thread and thought about this for a
> > while, here are my thoughts:
> 

> > * A single monolithic repository has quite a lot of advantages,
> > some
> > because of what it is (for instance, you can make atomic
> > cross-project commits), and some because of what it isn't (keeping
> > the repositories separate creates synchronization problems for
> > version-locked components, and it's not clear to me that we have a
> > good answer for these problems)
> 

> > * A single repository from which we can build a complete LLVM
> > toolchain, without requiring checking out a dozen components in
> > seemingly-random locations, would be valuable. The default behavior
> > for someone checking out and building the LLVM project should be
> > that they get a complete, fully-functional toolchain.
> 

> > * We need to preserve and maintain the easy ability to mix and
> > match
> > LLVM components with other components (other C runtime libraries,
> > C++ ABI libraries, C++ standard libraries, linkers, debuggers,
> > ...).
> > That means that it needs to be obvious what the boundaries of the
> > optional components are, which means that the current project
> > layout
> > (the one implied by the build system) is not good enough for a
> > monolithic repository (LLVM tests will fail if you don't check out
> > llvm/tools/opt, but we presumably want to explicitly support not
> > checking out llvm/tools/clang) -- unless we have extensive
> > documentation covering this, and even then there are likely to be
> > discoverability issues.
> 

> > However, the move to git and the reorganization need not be done at
> > the same time, and it seems vastly easier to reorganize *after* we
> > move to a monolithic git repository -- it would then be essentially
> > trivial for each person with organizational ideas to move the code
> > around in their monolithic git repository, push it somewhere where
> > we can all look at it, and for us to then make an informed choice
> > about the layout, with a concrete example in front of us. Then we
> > push the selected new layout; git supports this really nicely if
> > all
> > the parts are already in a single repository.
> 

> > So here's what I would suggest:
> 

> > - we move to a monolithic git repository on github
> 

> > - this monolithic repository contains all the LLVM subprojects
> > necessary to build a complete toolchain, including libc++ and other
> > pieces that are not version-locked to llvm or clang
> 

> > - the initial structure exactly matches the current layout implied
> > by
> > the build system (clang in tools/clang, lld in tools/lld,
> > compiler-rt in runtimes/compiler-rt, libc++ in projects/libcxx, and
> > so on)
> 

> > - after we transition to git, interested parties assemble and
> > upload
> > to github patches reorganizing the project structure, and we have
> > another discussion about principles for the restructuring
> > (including
> > forming solid guidance for how to organize future additions to
> > LLVM), with reference to the patches so we can look at the proposed
> > new layout; we pick one and commit it
> 

> > The goal would be to have the new layout entirely settled by the
> > time
> > 4.0 branches.
> 

> > On Wed, Jul 20, 2016 at 4:39 PM, Justin Lebar via llvm-dev <
> > llvm-dev at lists.llvm.org > wrote:
> 

> > > Dear all,
> > 
> 

> > > I would like to (re-)open a discussion on the following specific
> > > question:
> > 
> 

> > > Assuming we are moving the llvm project to git, should we
> > 
> 
> > > a) use multiple git repositories, linked together as
> > > subrepositories
> > 
> 
> > > of an umbrella repo, or
> > 
> 
> > > b) use a single git repository for most llvm subprojects.
> > 
> 

> > > The current proposal assembled by Renato follows option (a), but
> > > I
> > 
> 
> > > think option (b) will be significantly simpler and more
> > > effective.
> > 
> 
> > > Moreover, I think the issues raised with option (b) are either
> > 
> 
> > > incorrect or can be reasonably addressed.
> > 
> 

> > > Specifically, my proposal is that all LLVM subprojects that are
> > 
> 
> > > "version-locked" (and/or use the common CMake build system) live
> > > in
> > > a
> > 
> 
> > > single git repository. That probably means all of the main llvm
> > 
> 
> > > subprojects other than the test-suite and maybe libc++. From
> > > looking
> > 
> 
> > > at the repository today that would be: llvm, clang,
> > > clang-tools-extra,
> > 
> 
> > > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
> > 
> 

> > > Let's first talk about the advantages of a single repository.
> > > Then
> > 
> 
> > > we'll address the disadvantages raised.
> > 
> 

> > > At a high level, one repository is simpler than multiple repos
> > > that
> > 
> 
> > > must be kept in sync using an external mechanism. The submodules
> > 
> 
> > > solution requires nontrivial automation to maintain the history
> > > of
> > 
> 
> > > commits in the umbrella repo (which we need if we want to bisect,
> > > or
> > 
> 
> > > even just build an old revision of clang), but no such mechanisms
> > > are
> > 
> 
> > > required if we have a single repo.
> > 
> 

> > > Similarly, it's possible to make atomic API changes across
> > > subprojects
> > 
> 
> > > in a single repo; we simply can't do with the submodules
> > > proposal.
> > 
> 
> > > And working with llvm release branches becomes much simpler.
> > 
> 

> > > In addition, the single repository approach ties branches that
> > > contain
> > 
> 
> > > changes to subprojects (e.g. clang) to a specific version of llvm
> > 
> 
> > > proper. This means that when you switch between two branches that
> > 
> 
> > > contain changes to clang, you'll automatically check out the
> > > right
> > 
> 
> > > llvm bits.
> > 
> 

> > > Although we can do this with submodules too, a single repository
> > > makes
> > 
> 
> > > it much easier.
> > 
> 

> > > As a concrete example, suppose you are working on some changes in
> > 
> 
> > > clang. You want to commit the changes, then switch to a new
> > > branch
> > 
> 
> > > based on tip of head and make some new changes. Finally you want
> > > to
> > 
> 
> > > switch back to your original branch. And when you switch between
> > 
> 
> > > branches, you want to get an llvm that's in sync with the clang
> > > in
> > 
> 
> > > your working copy.
> > 
> 

> > > Here's how I'd do it with a monolithic git repository, option
> > > (b):
> > 
> 

> > > git commit # old-branch
> > 
> 
> > > git fetch
> > 
> 
> > > git checkout -b new-branch origin/master
> > 
> 
> > > # hack hack hack
> > 
> 
> > > git commit # new-branch
> > 
> 
> > > git checkout old-branch
> > 
> 

> > > Here's how I'd do it with option (a), submodules. I've used git
> > > -C
> > 
> 
> > > here to make it explicit which repo we're working in, but in real
> > > life
> > 
> 
> > > I'd probably use cd.
> > 
> 

> > > # First, commit to two branches, one in your clang repo and one
> > > in
> > > your
> > 
> 
> > > # master repo.
> > 
> 
> > > git -C tools/clang commit # old-branch, clang submodule
> > 
> 
> > > git commit # old-branch, master repo
> > 
> 
> > > # Now fetch the submodule and check out head. Start a new branch
> > > in
> > > the
> > 
> 
> > > # umbrella repo.
> > 
> 
> > > git submodule foreach fetch
> > 
> 
> > > git checkout -b origin/master new-branch
> > 
> 
> > > git submodule update
> > 
> 
> > > # Start a new branch in the clang repo pointing to the current
> > > head.
> > 
> 
> > > git checkout -b -C tools/clang new-branch
> > 
> 
> > > # hack hack hack
> > 
> 
> > > # Commit both branches.
> > 
> 
> > > git commit -C tools/clang # new-branch
> > 
> 
> > > git commit # new-branch
> > 
> 
> > > # Check out the old branch.
> > 
> 
> > > git checkout old-branch
> > 
> 
> > > git submodule update
> > 
> 

> > > This is twice as many git commands, and almost three times as
> > > much
> > 
> 
> > > typing, to do the same thing.
> > 
> 

> > > Indeed, this is so complicated I expect that many developers
> > > wouldn't
> > 
> 
> > > bother, and will continue to develop the way we currently do.
> > > They
> > 
> 
> > > would thus continue to be unable to create clang branches that
> > > include
> > 
> 
> > > an llvm revision. :(
> > 
> 

> > > There are real simplifications and productivity advantages to be
> > > had
> > 
> 
> > > by using a single repository. They will affect essentially every
> > 
> 
> > > developer who makes changes to subprojects other than LLVM
> > > proper,
> > 
> 
> > > cares about release branches, bisects our code, or builds old
> > 
> 
> > > revisions.
> > 
> 

> > > So that's the first part, what we have to gain by using a
> > > monolithic
> > 
> 
> > > repository. Let's address the downsides.
> > 
> 

> > > If you'll bear with a hypothetical: Imagine you could somehow
> > > make
> > > the
> > 
> 
> > > monolithic repository behave exactly like the N separate
> > > repositories
> > 
> 
> > > work today. If so, that would be the best of both worlds: Those
> > > of
> > > us
> > 
> 
> > > who want a monolithic repository could have one, and those of us
> > > who
> > 
> 
> > > don't would be unaffected. Whatever downsides you were worried
> > > about
> > 
> 
> > > would evaporate in a mist of rainbows and puppies.
> > 
> 

> > > It turns out this hypothetical is very close to reality. The key
> > > is
> > 
> 
> > > git sparse checkouts [1], which let you check out only some files
> > > or
> > 
> 
> > > directories from a repository. Using this facility, if you don't
> > > like
> > 
> 
> > > the switch to a monolithic repository, you can set up your git so
> > 
> 
> > > you're (almost) entirely unaffected by it.
> > 
> 

> > > If you want to check out only llvm and clang, no problem. Just
> > > set
> > > up
> > 
> 
> > > your .git/info/sparse-checkout file appropriately. Done.
> > 
> 

> > > If you want to be able to have two different revisions of llvm
> > > and
> > 
> 
> > > clang checked out at once (maybe you want to update your clang
> > > bits
> > 
> 
> > > more often than you update your llvm bits), you can do that too.
> > > Make
> > 
> 
> > > one sparse checkout just of llvm, and make another sparse
> > > checkout
> > 
> 
> > > just of clang. Symlink the clang checkout to llvm/tools/clang.
> > 
> 
> > > That's it. The two checkouts can even share a common .git dir, so
> > > you
> > 
> 
> > > don't have to fetch and store everything twice.
> > 
> 

> > > As far as I can tell, the only overhead of the monolithic
> > > repository
> > 
> 
> > > is the extra storage in .git. But this is quite small in the
> > > scheme
> > 
> 
> > > of things.
> > 
> 

> > > The .git dir for the existing monolithic repository [2] is 1.2GB.
> > > By
> > 
> 
> > > way of comparison, my objdir for a release build of llvm and
> > > clang
> > > is
> > 
> 
> > > 3.5G, and a full checkout (workdir + .git dirs) of llvm and clang
> > > is
> > 
> 
> > > 0.65G.
> > 
> 

> > > If the 1.2G really is a problem for you (or more likely, your
> > 
> 
> > > automated infrastructure), a shallow clone [3] takes this down to
> > > 90M.
> > 
> 

> > > The critical point to me in all this is that it's easy to set up
> > > the
> > 
> 
> > > monolithic repository to appear like it's a bunch of separate
> > > repos.
> > 
> 
> > > But it is impossible, insofar as I can tell, to do the opposite.
> > > That
> > 
> 
> > > is, option (b) is strictly more powerful than option (a).
> > 
> 

> > > Renato has understandably pointed out that the current proposal
> > > is
> > 
> 
> > > pretty far along, so please speak up now if you want to make this
> > 
> 
> > > happen. I think we can.
> > 
> 

> > > Regards,
> > 
> 
> > > -Justin
> > 
> 

> > > [1] Git sparse checkouts were introduced in git 1.7, in 2010. For
> > > more
> > 
> 
> > > info, see
> > > http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
> > > .
> > 
> 
> > > As far as I can tell, sparse checkouts work fine on Windows, but
> > > you
> > 
> 
> > > have to use git-bash, see http://stackoverflow.com/q/23289006 .
> > 
> 
> > > [2] https://github.com/llvm-project/llvm-project
> > 
> 
> > > [3] git clone --depth=1
> > > https://github.com/llvm-project/llvm-project.git
> > 
> 
> > > _______________________________________________
> > 
> 
> > > LLVM Developers mailing list
> > 
> 
> > > llvm-dev at lists.llvm.org
> > 
> 
> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> > 
> 

> > _______________________________________________
> 
> > LLVM Developers mailing list
> 
> > llvm-dev at lists.llvm.org
> 
> > http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 

> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

-- 

Hal Finkel 
Assistant Computational Scientist 
Leadership Computing Facility 
Argonne National Laboratory 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160722/04970271/attachment.html>


More information about the llvm-dev mailing list