[llvm-dev] [RFC] One or many git repositories?

Sun Jul 31 00:56:27 PDT 2016

> On Jul 31, 2016, at 12:36 AM, Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
>> This suggests what we really want in this case are several updates (one to each repository) on a branch that is then merged in one instant into the umbrella repository. Then the only thing the bots would see would be the merge commit and thus state is synchronized.
> 
> The script that updates the umbrella repo would need to know that the
> several updates should all go into one branch that is then merged in
> one instant into the umbrella repo's master.  At the point that the
> umbrella repo can know this, it might as well just make a single
> commit to master that updates all N subproject hashes at once (which
> is what I was suggesting) -- I don't see how having a branch makes the
> situation any less complicated.

Ok. Sure. I was thinking out loud.

> 
>> The natural way to do this would be via a multiple-repo PR.
> 
> That doesn't exist in github, right?  We would have to somehow create
> this multi-repo PR-management system and link it in with the script
> that is managing the umbrella repo?  That is what I was describing.

The script in this case would not be making the change. The change would be made by a special trusted continuous integration bot that does the merging. In Swift land we have been using our @swift-ci system to merge things into master after testing with great success. In terms of the script, the script would see that it couldn't push a change and then would then just restart the loop.

> 
> Again, I am not claiming this isn't possible.  And, I don't care a ton
> about complexity on the server-side.  But I do care about complexity
> on the client side.  I think it's highly unlikely that there exists a
> system for creating atomic commits and then checking out code in a way
> that respects that atomicity that as simple as "git commit" and "git
> checkout" (which is what we'd have on the monorepo).

The key assumption here is that LLVM will not switch to a heavy PR model (something which from what I understand is not a part of this specific discussion and will be considered strictly after the move to github). In such a case, I believe that it will be relatively simple to communicate this to the CI and have the CI manage it for you. If on the swift side we implement such a thing, I would be more than happy to provide guidance to you to help setup such a system reusing the work on the Swift side.

Another thing that I do not understand about the mono-repo proposal is that (*note* correct me if I am wrong) is that we can only avoid external synchronization if we get /all/ projects that have build dependencies on the mono-repo into the mono-repo. This suggests that (unless we are saying that synchronization of those repositories are not important), that we will need to invest in some sort of synchronization regardless of the mono-repo proposal. In such a case the mono-repo proposal is essentially just an attempt to make it convenient for a large subset of the community to ease their workflows, rather than truly being an alternative to the submodule proposal. Am I misunderstanding?

Michael

> 
> On Sun, Jul 31, 2016 at 12:25 AM, Justin Lebar <jlebar at google.com> wrote:
>> By the way, I've been using the existing read-only monorepo [1] for a
>> few days now.  The intent is to commit via the script I put together
>> [2], although I haven't committed anything other than a testing commit
>> [3].
>> 
>> All I can say is, *wow* is it nice.  I hid everything I don't care
>> about using a sparse checkout [4].  Many of my tools (e.g. ctrl-p [5]
>> [6], ycm [7]) suddenly work better now that there isn't an artificial
>> boundary between my clang and llvm repositories.  I can have patch
>> queues that include LLVM commits and clang commits arbitrarily
>> interspersed with one another -- something I didn't realize I wanted
>> until I made the switch and noticed I already had branches I could
>> merge (and something we can't do with Bogner's suggested multirepo
>> workflow).
>> 
>> [1] https://github.com/llvm-project/llvm-project
>> [2] https://github.com/jlebar/llvm-repo-tools
>> [3] https://github.com/llvm-project/llvm-project/commit/38a6db646d8f43cd9d7cec6c0533e40946cd162f
>> (which, embarrassingly, has a typo in the commit message)
>> [4] http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
>> [5] https://github.com/kien/ctrlp.vim
>> [6] https://github.com/jlebar/ctrlp-py-matcher
>> [7] https://github.com/Valloric/YouCompleteMe
>> 
>> On Sun, Jul 31, 2016 at 12:06 AM, Justin Lebar <jlebar at google.com> wrote:
>>>> And if it is, then the "only thing a monorepo gets you" isn't something that you need a monorepo to get.
>>> 
>>> This is an *extremely important* point to understand, so let me try to
>>> be really clear about the current state of the world and the state of
>>> the world under the two "move to git" proposals.
>>> 
>>> Today, all commits ultimately end up in SVN.  Our SVN is a effectively
>>> a monorepo, so today, a single commit can touch multiple subprojects.
>>> How you get the commit into SVN is your business.  Maybe you can hack
>>> git-svn somehow to do the atomic commit.  (If this is possible, it's
>>> beyond my ken.)  Alternatively you can just commit via SVN.  If you're
>>> a git user, I wrote a hacky script [1] that cherry-picks commits from
>>> the existing monorepo mirror and commits them via SVN.  It's annoying
>>> to do, but it is possible today to atomically commit to multiple
>>> subprojects, as you observed.
>>> 
>>> Under the monorepo proposal, this becomes much easier.  It's just "git
>>> commit", no magic.
>>> 
>>> Under the multirepo git proposal, this becomes either impossible or
>>> much more complicated.  Under the proposal, we have separate git
>>> repositories for each subproject, and we push directly to these.
>>> There's then an umbrella repository, which includes the subproject
>>> repos as git submodules.  There's a script which periodically checks
>>> the subproject repos for updates.  When it sees an update, it creates
>>> a new commit in the umbrella repository.  The script is the only thing
>>> that can create commits in the umbrella repo.
>>> 
>>> In order to get atomic commits in the multirepo world, we would need
>>> some way to inform the script that two otherwise separate commits
>>> should appear in the umbrella repo as a single commit.  We'd probably
>>> need to agree on a protocol communicated via commit messages.  We'd
>>> also probably need client-side scripts to set the commit messages
>>> appropriately.
>>> 
>>> I expect this would be so much of a hassle, even if we managed to
>>> implement it on the server side, it would be prohibitively complex for
>>> most users.
>>> 
>>> In addition, under the multirepo, you only get synchronized subproject
>>> commits in your local checkout if you choose to use a git-submodules
>>> based workflow.  If you use the workflow that we currently have, then
>>> on the client side, there is no guarantee that your subprojects will
>>> be sync'ed.  (This is the same as most peoples' client-side git
>>> workflows today.)  *Even if we manage to atomically commit across
>>> subprojects*, that is of limited utility unless those commits show up
>>> atomically on developers' workstations.  But using a workflow based on
>>> git-submodules is highly complex as compared to the monorepo -- this
>>> was what I was trying to illustrate in my very first email on this
>>> thread.
>>> 
>>> When we say "the monorepo gets you atomic commits," that's an abbreviation for
>>> 
>>> 1) The monorepo makes it far simpler to make atomic commits from git
>>> as compared to the current SVN setup.
>>> 2) Atomic commits are definitely possible in the monorepo.  They are
>>> theoretically possible in the multirepo, with extensive tooling etc.
>>> 3) Under the basic monorepo workflow, your checkouts are always
>>> correct with respect to atomic commits.  Under the basic multirepo
>>> workflow, this is not true -- you have to engage with git submodules
>>> to get this property, and that is a giant pain.
>>> 
>>> Sorry for the wall of text, but this is important.
>>> 
>>> [1] https://github.com/jlebar/llvm-repo-tools.  Be careful, I've only
>>> made one commit with it so far.  :)
>>> 
>>> On Sat, Jul 30, 2016 at 10:38 PM, Robinson, Paul <paul.robinson at sony.com> wrote:
>>>>> The only thing a monorepo gets you that strictly isn’t possible without
>>>>> it is the ability to commit to multiple projects in a single commit.
>>>>> Personally I don’t think that is a big enough justification, but that is
>>>>> my opinion, not a fact.
>>>> 
>>>> Okay, I just bumped into r277008, in which commits to llvm, clang, and
>>>> clang-tools-extra all have the same SVN revision number.
>>>> I don't know how it happened but it did.  Is this just an artifact of
>>>> how somebody pasted together a bunch of git-svn projects, or is it
>>>> something that a top-level git repo with submodules would allow?
>>>> And if it is, then the "only thing a monorepo gets you" isn't something
>>>> that you need a monorepo to get.
>>>> Your befuddled correspondent,
>>>> --paulr
>>>> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev