[llvm-dev] [RFC] One or many git repositories?

Justin Bogner via llvm-dev llvm-dev at lists.llvm.org
Wed Jul 20 17:36:38 PDT 2016

Chandler Carruth <chandlerc at google.com> writes:
> On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes:
>> > I would like to (re-)open a discussion on the following specific
>> question:
>> >
>> >   Assuming we are moving the llvm project to git, should we
>> >   a) use multiple git repositories, linked together as subrepositories
>> > of an umbrella repo, or
>> >   b) use a single git repository for most llvm subprojects.
>> >
>> > The current proposal assembled by Renato follows option (a), but I
>> > think option (b) will be significantly simpler and more effective.
>> > Moreover, I think the issues raised with option (b) are either
>> > incorrect or can be reasonably addressed.
>> >
>> > Specifically, my proposal is that all LLVM subprojects that are
>> > "version-locked" (and/or use the common CMake build system) live in a
>> > single git repository.  That probably means all of the main llvm
>> > subprojects other than the test-suite and maybe libc++.  From looking
>> > at the repository today that would be: llvm, clang, clang-tools-extra,
>> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
>> FWIW, I'm opposed. I'm not convinced that the problems with multiple
>> repos are any worse than the problems with a single repo, which makes
>> this more or less just change for the sake of change, IMO.
> It would be useful to know what problems you see with a single repo that
> are more significant. In particular, either why you think the problems
> jlebar already mentioned are worse than he sees them, or what other
> problems are that he hasn't addressed.

Running the same 'git checkout' commands on multiple repos has always
been sufficient to manage the multiple repos so far - as long as you
create the same branches and tags in each repo, it's easy[1] to manage
the set of repos with a script that cd's to each one and runs whatever
git command.

So it's a pretty minor inconvenience today to have the multiple repos in
the case where you want to check out all of them.

OTOH, if all of the repos are combined into one, you have to do work
when you only want some of them. In my experience, this is basically
always - between my various machines and projects I have a several
checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of
checkouts of just llvm. I've only checked out the other repos when I was
changing APIs and needed to update them.

I haven't tried the options jlebar has described to deal with these -
sparse checkouts and whatnot, but they seem like an equivalent amount of
work/learning curve as writing a script that cd's to several directories
and runs the same git command in each.

Thus, this also sounds like a minor inconvenience. I just don't see how
trading one for the other is worth doing, since AFAICT they're equally

[1] My understanding of the "umbrella repo" thing for bisecting is that
    it'll be managed automatically by a cron or checkin hooks or
    whatever, so the bit's in jlebar's description about updating
    submodules seem like a red herring. I'm assuming that we end up in a
    place where working with git is essentially the same as we work with
    git-svn today.

More information about the llvm-dev mailing list