[llvm-dev] [RFC] One or many git repositories?

Wed Jul 20 19:02:21 PDT 2016

> I don't know man, when I create a branch to save my clang work I just
> create a branch with the same name in all the other repos I have checked
> out, then it just stays in the state I left it in as I go do other
> stuff. This kind of problem just hasn't really come up for me.

Ah, I understand your workflow now.  That works, I guess.  It's
definitely better than what I've been doing.  :)

You have to write and use these scripts, of course.  I think that's
the main problem -- git is hard enough as it is; asking me to do most
git commands completely differently when I happen to be working on
llvm is asking a lot.  Even asking everyone to realize that there's a
better way is asking a lot.  Inasmuch as we can make the commands we
type every day Just Work Like Any Other Git Repository, I think that's
a clear win for the community's overall productivity.

Beyond that, I guess the main benefits wrt workflow of the single repo
are that you can much more easily work with cross-cutting changes.
You can stash them, bisect them, reorder them, commit a bunch with one
command, whatever, there's nothing special about the fact that they're
cross-cutting.

And of course we don't get atomic commits across subprojects at all
without a single repo.  That really would be nice for certain kinds of
changes.

But I think the bigger point wrt workflows is that there's a real
benefit to having fewer special snowflakes in our lives.

-Justin L.

On Wed, Jul 20, 2016 at 6:26 PM, Justin Bogner <mail at justinbogner.com> wrote:
> Justin Lebar <jlebar at google.com> writes:
>>> Running the same 'git checkout' commands on multiple repos has
>>> always been sufficient to manage the multiple repos so far
>>
>> Huh.  It definitely hasn't worked well for me.
>>
>> Here's the issue I face every day.  I may be working on (unrelated)
>> changes to clang and llvm.  I update my llvm tree (say I checked in a
>> patch, or I want to pull in changes someone else has checked in).  Now
>> I want to go back to hacking on my clang stuff.  Because my clang
>> branch is not connected to a specific LLVM revision, it no longer
>> compiles.  I'm trying to build an old clang against a new llvm.
>>
>> Now I have to pull the latest clang and rebase my patches.  After I
>> deal with rebase conflicts (not what I wanted to do at the moment!),
>> I'm in a new state, which means when I build my ccache is no help.
>> And when I run the clang tests, I don't know whether to expect test
>> failures.  So then I have to pop of my patches and run at head...
>> (Maybe I have to update clang!  In which case I also have to update
>> llvm...)
>>
>> This would all be solved with zero work on my part if llvm and clang
>> were in one repository.  Then when I switched to working on my clang
>> patches, I would automatically check out a version of LLVM that is
>> compatible.
>>
>> I think this is the main thing that people aren't getting.  Maybe
>> because it's never been possible before to have a workflow like this.
>> But having a git branch that you can check out and immediately build
>> -- without any rebasing, re-syncing, or other messing around -- is
>> incredibly powerful.
>
> I don't know man, when I create a branch to save my clang work I just
> create a branch with the same name in all the other repos I have checked
> out, then it just stays in the state I left it in as I go do other
> stuff. This kind of problem just hasn't really come up for me.
>
>> Please let me know if this is still not clear -- it's kind of the key point.
>>
>> As I said, you can accomplish this with submodules, too, but it
>> requires the complex hackery from my original email.
>>
>> To me, this is not at all a minor inconvenience.  It's at least an
>> hour of wasted time every week.
>>
>>> I haven't tried the options jlebar has described to deal with these
>>> - sparse checkouts and whatnot, but they seem like an equivalent
>>> amount of work/learning curve as writing a script that cd's to
>>> several directories and runs the same git command in each.
>>
>> I'll send sparse checkout instructions separately.  But my example
>> submodules commands are not at all equivalent to a script that cd's
>> into several directories and runs a git command in each, and I think
>> this is the main point of confusion.  (In fact you wouldn't need to
>> write such a script; it's just "git submodule foreach".)
>>
>> The submodules commands creates a single branch in the umbrella repo
>> that encompasses the checked-out state of *all the LLVM subrepos*.  So
>> you can, at a later time, check out this branch in the umbrella repo
>> and all the clang, llvm, etc. bits will be identical to the last time
>> you were on the branch.
>>
>> If all you want is to continue using git the way you use it now, the
>> multiple git repos gets you that (as does a sparse checkout on the
>> single repo).  My point is that, the move to git opens up a new, much
>> more powerful workflow with branches that encompass both llvm and
>> clang state.  We can do this with or without submodules, but using
>> submodules for this is far more awkward than using a single repo.
>
> If I do `git log` in a sparse checkout that just has LLVM, will it only
> show me LLVM commits? That is, how easy is it to filter out the
> clang/lldb/subproject-X commits from a log? Negative globs are kind of
> awkward.