[llvm-dev] [RFC] One or many git repositories?

Wed Jul 20 18:00:07 PDT 2016

> Running the same 'git checkout' commands on multiple repos has always been sufficient to manage the multiple repos so far

Huh.  It definitely hasn't worked well for me.

Here's the issue I face every day.  I may be working on (unrelated)
changes to clang and llvm.  I update my llvm tree (say I checked in a
patch, or I want to pull in changes someone else has checked in).  Now
I want to go back to hacking on my clang stuff.  Because my clang
branch is not connected to a specific LLVM revision, it no longer
compiles.  I'm trying to build an old clang against a new llvm.

Now I have to pull the latest clang and rebase my patches.  After I
deal with rebase conflicts (not what I wanted to do at the moment!),
I'm in a new state, which means when I build my ccache is no help.
And when I run the clang tests, I don't know whether to expect test
failures.  So then I have to pop of my patches and run at head...
(Maybe I have to update clang!  In which case I also have to update
llvm...)

This would all be solved with zero work on my part if llvm and clang
were in one repository.  Then when I switched to working on my clang
patches, I would automatically check out a version of LLVM that is
compatible.

I think this is the main thing that people aren't getting.  Maybe
because it's never been possible before to have a workflow like this.
But having a git branch that you can check out and immediately build
-- without any rebasing, re-syncing, or other messing around -- is
incredibly powerful.

Please let me know if this is still not clear -- it's kind of the key point.

As I said, you can accomplish this with submodules, too, but it
requires the complex hackery from my original email.

To me, this is not at all a minor inconvenience.  It's at least an
hour of wasted time every week.

> I haven't tried the options jlebar has described to deal with these - sparse checkouts and whatnot, but they seem like an equivalent amount of work/learning curve as writing a script that cd's to several directories and runs the same git command in each.

I'll send sparse checkout instructions separately.  But my example
submodules commands are not at all equivalent to a script that cd's
into several directories and runs a git command in each, and I think
this is the main point of confusion.  (In fact you wouldn't need to
write such a script; it's just "git submodule foreach".)

The submodules commands creates a single branch in the umbrella repo
that encompasses the checked-out state of *all the LLVM subrepos*.  So
you can, at a later time, check out this branch in the umbrella repo
and all the clang, llvm, etc. bits will be identical to the last time
you were on the branch.

If all you want is to continue using git the way you use it now, the
multiple git repos gets you that (as does a sparse checkout on the
single repo).  My point is that, the move to git opens up a new, much
more powerful workflow with branches that encompass both llvm and
clang state.  We can do this with or without submodules, but using
submodules for this is far more awkward than using a single repo.

-Justin L.

On Wed, Jul 20, 2016 at 5:36 PM, Justin Bogner via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> Chandler Carruth <chandlerc at google.com> writes:
>> On Wed, Jul 20, 2016 at 5:02 PM Justin Bogner via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes:
>>> > I would like to (re-)open a discussion on the following specific
>>> question:
>>> >
>>> >   Assuming we are moving the llvm project to git, should we
>>> >   a) use multiple git repositories, linked together as subrepositories
>>> > of an umbrella repo, or
>>> >   b) use a single git repository for most llvm subprojects.
>>> >
>>> > The current proposal assembled by Renato follows option (a), but I
>>> > think option (b) will be significantly simpler and more effective.
>>> > Moreover, I think the issues raised with option (b) are either
>>> > incorrect or can be reasonably addressed.
>>> >
>>> > Specifically, my proposal is that all LLVM subprojects that are
>>> > "version-locked" (and/or use the common CMake build system) live in a
>>> > single git repository.  That probably means all of the main llvm
>>> > subprojects other than the test-suite and maybe libc++.  From looking
>>> > at the repository today that would be: llvm, clang, clang-tools-extra,
>>> > lld, polly, lldb, llgo, compiler-rt, openmp, and parallel-libs.
>>>
>>> FWIW, I'm opposed. I'm not convinced that the problems with multiple
>>> repos are any worse than the problems with a single repo, which makes
>>> this more or less just change for the sake of change, IMO.
>>>
>>
>> It would be useful to know what problems you see with a single repo that
>> are more significant. In particular, either why you think the problems
>> jlebar already mentioned are worse than he sees them, or what other
>> problems are that he hasn't addressed.
>
> Running the same 'git checkout' commands on multiple repos has always
> been sufficient to manage the multiple repos so far - as long as you
> create the same branches and tags in each repo, it's easy[1] to manage
> the set of repos with a script that cd's to each one and runs whatever
> git command.
>
> So it's a pretty minor inconvenience today to have the multiple repos in
> the case where you want to check out all of them.
>
> OTOH, if all of the repos are combined into one, you have to do work
> when you only want some of them. In my experience, this is basically
> always - between my various machines and projects I have a several
> checkouts of llvm+compiler-rt+clang+libc++, and I have a lot of
> checkouts of just llvm. I've only checked out the other repos when I was
> changing APIs and needed to update them.
>
> I haven't tried the options jlebar has described to deal with these -
> sparse checkouts and whatnot, but they seem like an equivalent amount of
> work/learning curve as writing a script that cd's to several directories
> and runs the same git command in each.
>
> Thus, this also sounds like a minor inconvenience. I just don't see how
> trading one for the other is worth doing, since AFAICT they're equally
> inconvenient.
>
> [1] My understanding of the "umbrella repo" thing for bisecting is that
>     it'll be managed automatically by a cron or checkin hooks or
>     whatever, so the bit's in jlebar's description about updating
>     submodules seem like a red herring. I'm assuming that we end up in a
>     place where working with git is essentially the same as we work with
>     git-svn today.
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev