[llvm-dev] [RFC] One or many git repositories?

Sun Jul 31 00:36:42 PDT 2016

> This suggests what we really want in this case are several updates (one to each repository) on a branch that is then merged in one instant into the umbrella repository. Then the only thing the bots would see would be the merge commit and thus state is synchronized.

The script that updates the umbrella repo would need to know that the
several updates should all go into one branch that is then merged in
one instant into the umbrella repo's master.  At the point that the
umbrella repo can know this, it might as well just make a single
commit to master that updates all N subproject hashes at once (which
is what I was suggesting) -- I don't see how having a branch makes the
situation any less complicated.

> The natural way to do this would be via a multiple-repo PR.

That doesn't exist in github, right?  We would have to somehow create
this multi-repo PR-management system and link it in with the script
that is managing the umbrella repo?  That is what I was describing.

Again, I am not claiming this isn't possible.  And, I don't care a ton
about complexity on the server-side.  But I do care about complexity
on the client side.  I think it's highly unlikely that there exists a
system for creating atomic commits and then checking out code in a way
that respects that atomicity that as simple as "git commit" and "git
checkout" (which is what we'd have on the monorepo).

On Sun, Jul 31, 2016 at 12:25 AM, Justin Lebar <jlebar at google.com> wrote:
> By the way, I've been using the existing read-only monorepo [1] for a
> few days now.  The intent is to commit via the script I put together
> [2], although I haven't committed anything other than a testing commit
> [3].
>
> All I can say is, *wow* is it nice.  I hid everything I don't care
> about using a sparse checkout [4].  Many of my tools (e.g. ctrl-p [5]
> [6], ycm [7]) suddenly work better now that there isn't an artificial
> boundary between my clang and llvm repositories.  I can have patch
> queues that include LLVM commits and clang commits arbitrarily
> interspersed with one another -- something I didn't realize I wanted
> until I made the switch and noticed I already had branches I could
> merge (and something we can't do with Bogner's suggested multirepo
> workflow).
>
> [1] https://github.com/llvm-project/llvm-project
> [2] https://github.com/jlebar/llvm-repo-tools
> [3] https://github.com/llvm-project/llvm-project/commit/38a6db646d8f43cd9d7cec6c0533e40946cd162f
> (which, embarrassingly, has a typo in the commit message)
> [4] http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
> [5] https://github.com/kien/ctrlp.vim
> [6] https://github.com/jlebar/ctrlp-py-matcher
> [7] https://github.com/Valloric/YouCompleteMe
>
> On Sun, Jul 31, 2016 at 12:06 AM, Justin Lebar <jlebar at google.com> wrote:
>>> And if it is, then the "only thing a monorepo gets you" isn't something that you need a monorepo to get.
>>
>> This is an *extremely important* point to understand, so let me try to
>> be really clear about the current state of the world and the state of
>> the world under the two "move to git" proposals.
>>
>> Today, all commits ultimately end up in SVN.  Our SVN is a effectively
>> a monorepo, so today, a single commit can touch multiple subprojects.
>> How you get the commit into SVN is your business.  Maybe you can hack
>> git-svn somehow to do the atomic commit.  (If this is possible, it's
>> beyond my ken.)  Alternatively you can just commit via SVN.  If you're
>> a git user, I wrote a hacky script [1] that cherry-picks commits from
>> the existing monorepo mirror and commits them via SVN.  It's annoying
>> to do, but it is possible today to atomically commit to multiple
>> subprojects, as you observed.
>>
>> Under the monorepo proposal, this becomes much easier.  It's just "git
>> commit", no magic.
>>
>> Under the multirepo git proposal, this becomes either impossible or
>> much more complicated.  Under the proposal, we have separate git
>> repositories for each subproject, and we push directly to these.
>> There's then an umbrella repository, which includes the subproject
>> repos as git submodules.  There's a script which periodically checks
>> the subproject repos for updates.  When it sees an update, it creates
>> a new commit in the umbrella repository.  The script is the only thing
>> that can create commits in the umbrella repo.
>>
>> In order to get atomic commits in the multirepo world, we would need
>> some way to inform the script that two otherwise separate commits
>> should appear in the umbrella repo as a single commit.  We'd probably
>> need to agree on a protocol communicated via commit messages.  We'd
>> also probably need client-side scripts to set the commit messages
>> appropriately.
>>
>> I expect this would be so much of a hassle, even if we managed to
>> implement it on the server side, it would be prohibitively complex for
>> most users.
>>
>> In addition, under the multirepo, you only get synchronized subproject
>> commits in your local checkout if you choose to use a git-submodules
>> based workflow.  If you use the workflow that we currently have, then
>> on the client side, there is no guarantee that your subprojects will
>> be sync'ed.  (This is the same as most peoples' client-side git
>> workflows today.)  *Even if we manage to atomically commit across
>> subprojects*, that is of limited utility unless those commits show up
>> atomically on developers' workstations.  But using a workflow based on
>> git-submodules is highly complex as compared to the monorepo -- this
>> was what I was trying to illustrate in my very first email on this
>> thread.
>>
>> When we say "the monorepo gets you atomic commits," that's an abbreviation for
>>
>> 1) The monorepo makes it far simpler to make atomic commits from git
>> as compared to the current SVN setup.
>> 2) Atomic commits are definitely possible in the monorepo.  They are
>> theoretically possible in the multirepo, with extensive tooling etc.
>> 3) Under the basic monorepo workflow, your checkouts are always
>> correct with respect to atomic commits.  Under the basic multirepo
>> workflow, this is not true -- you have to engage with git submodules
>> to get this property, and that is a giant pain.
>>
>> Sorry for the wall of text, but this is important.
>>
>> [1] https://github.com/jlebar/llvm-repo-tools.  Be careful, I've only
>> made one commit with it so far.  :)
>>
>> On Sat, Jul 30, 2016 at 10:38 PM, Robinson, Paul <paul.robinson at sony.com> wrote:
>>>> The only thing a monorepo gets you that strictly isn’t possible without
>>>> it is the ability to commit to multiple projects in a single commit.
>>>> Personally I don’t think that is a big enough justification, but that is
>>>> my opinion, not a fact.
>>>
>>> Okay, I just bumped into r277008, in which commits to llvm, clang, and
>>> clang-tools-extra all have the same SVN revision number.
>>> I don't know how it happened but it did.  Is this just an artifact of
>>> how somebody pasted together a bunch of git-svn projects, or is it
>>> something that a top-level git repo with submodules would allow?
>>> And if it is, then the "only thing a monorepo gets you" isn't something
>>> that you need a monorepo to get.
>>> Your befuddled correspondent,
>>> --paulr
>>>