[llvm-dev] [RFC] One or many git repositories?

Michael Gottesman via llvm-dev llvm-dev at lists.llvm.org
Sun Jul 31 19:22:00 PDT 2016


> On Jul 31, 2016, at 12:56 AM, Michael Gottesman via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
>> 
>> On Jul 31, 2016, at 12:36 AM, Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>> wrote:
>> 
>>> This suggests what we really want in this case are several updates (one to each repository) on a branch that is then merged in one instant into the umbrella repository. Then the only thing the bots would see would be the merge commit and thus state is synchronized.
>> 
>> The script that updates the umbrella repo would need to know that the
>> several updates should all go into one branch that is then merged in
>> one instant into the umbrella repo's master.  At the point that the
>> umbrella repo can know this, it might as well just make a single
>> commit to master that updates all N subproject hashes at once (which
>> is what I was suggesting) -- I don't see how having a branch makes the
>> situation any less complicated.
> 
> Ok. Sure. I was thinking out loud.
> 
>> 
>>> The natural way to do this would be via a multiple-repo PR.
>> 
>> That doesn't exist in github, right?  We would have to somehow create
>> this multi-repo PR-management system and link it in with the script
>> that is managing the umbrella repo?  That is what I was describing.
> 
> The script in this case would not be making the change. The change would be made by a special trusted continuous integration bot that does the merging. In Swift land we have been using our @swift-ci system to merge things into master after testing with great success. In terms of the script, the script would see that it couldn't push a change and then would then just restart the loop.
> 
>> 
>> Again, I am not claiming this isn't possible.  And, I don't care a ton
>> about complexity on the server-side.  But I do care about complexity
>> on the client side.  I think it's highly unlikely that there exists a
>> system for creating atomic commits and then checking out code in a way
>> that respects that atomicity that as simple as "git commit" and "git
>> checkout" (which is what we'd have on the monorepo).
> 
> The key assumption here is that LLVM will not switch to a heavy PR model (something which from what I understand is not a part of this specific discussion and will be considered strictly after the move to github). In such a case, I believe that it will be relatively simple to communicate this to the CI and have the CI manage it for you. If on the swift side we implement such a thing, I would be more than happy to provide guidance to you to help setup such a system reusing the work on the Swift side.
> 
> Another thing that I do not understand about the mono-repo proposal is that (*note* correct me if I am wrong) is that we can only avoid external synchronization if we get /all/ projects that have build dependencies on the mono-repo into the mono-repo. This suggests that (unless we are saying that synchronization of those repositories are not important), that we will need to invest in some sort of synchronization regardless of the mono-repo proposal. In such a case the mono-repo proposal is essentially just an attempt to make it convenient for a large subset of the community to ease their workflows, rather than truly being an alternative to the submodule proposal. Am I misunderstanding?

Just an FYI, I talked with jlebar on IRC and we advanced the conversation, I am going to update the document when I get some time later tonight.

Michael

> 
> Michael
> 
>> 
>> On Sun, Jul 31, 2016 at 12:25 AM, Justin Lebar <jlebar at google.com> wrote:
>>> By the way, I've been using the existing read-only monorepo [1] for a
>>> few days now.  The intent is to commit via the script I put together
>>> [2], although I haven't committed anything other than a testing commit
>>> [3].
>>> 
>>> All I can say is, *wow* is it nice.  I hid everything I don't care
>>> about using a sparse checkout [4].  Many of my tools (e.g. ctrl-p [5]
>>> [6], ycm [7]) suddenly work better now that there isn't an artificial
>>> boundary between my clang and llvm repositories.  I can have patch
>>> queues that include LLVM commits and clang commits arbitrarily
>>> interspersed with one another -- something I didn't realize I wanted
>>> until I made the switch and noticed I already had branches I could
>>> merge (and something we can't do with Bogner's suggested multirepo
>>> workflow).
>>> 
>>> [1] https://github.com/llvm-project/llvm-project
>>> [2] https://github.com/jlebar/llvm-repo-tools
>>> [3] https://github.com/llvm-project/llvm-project/commit/38a6db646d8f43cd9d7cec6c0533e40946cd162f
>>> (which, embarrassingly, has a typo in the commit message)
>>> [4] http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
>>> [5] https://github.com/kien/ctrlp.vim
>>> [6] https://github.com/jlebar/ctrlp-py-matcher
>>> [7] https://github.com/Valloric/YouCompleteMe
>>> 
>>> On Sun, Jul 31, 2016 at 12:06 AM, Justin Lebar <jlebar at google.com> wrote:
>>>>> And if it is, then the "only thing a monorepo gets you" isn't something that you need a monorepo to get.
>>>> 
>>>> This is an *extremely important* point to understand, so let me try to
>>>> be really clear about the current state of the world and the state of
>>>> the world under the two "move to git" proposals.
>>>> 
>>>> Today, all commits ultimately end up in SVN.  Our SVN is a effectively
>>>> a monorepo, so today, a single commit can touch multiple subprojects.
>>>> How you get the commit into SVN is your business.  Maybe you can hack
>>>> git-svn somehow to do the atomic commit.  (If this is possible, it's
>>>> beyond my ken.)  Alternatively you can just commit via SVN.  If you're
>>>> a git user, I wrote a hacky script [1] that cherry-picks commits from
>>>> the existing monorepo mirror and commits them via SVN.  It's annoying
>>>> to do, but it is possible today to atomically commit to multiple
>>>> subprojects, as you observed.
>>>> 
>>>> Under the monorepo proposal, this becomes much easier.  It's just "git
>>>> commit", no magic.
>>>> 
>>>> Under the multirepo git proposal, this becomes either impossible or
>>>> much more complicated.  Under the proposal, we have separate git
>>>> repositories for each subproject, and we push directly to these.
>>>> There's then an umbrella repository, which includes the subproject
>>>> repos as git submodules.  There's a script which periodically checks
>>>> the subproject repos for updates.  When it sees an update, it creates
>>>> a new commit in the umbrella repository.  The script is the only thing
>>>> that can create commits in the umbrella repo.
>>>> 
>>>> In order to get atomic commits in the multirepo world, we would need
>>>> some way to inform the script that two otherwise separate commits
>>>> should appear in the umbrella repo as a single commit.  We'd probably
>>>> need to agree on a protocol communicated via commit messages.  We'd
>>>> also probably need client-side scripts to set the commit messages
>>>> appropriately.
>>>> 
>>>> I expect this would be so much of a hassle, even if we managed to
>>>> implement it on the server side, it would be prohibitively complex for
>>>> most users.
>>>> 
>>>> In addition, under the multirepo, you only get synchronized subproject
>>>> commits in your local checkout if you choose to use a git-submodules
>>>> based workflow.  If you use the workflow that we currently have, then
>>>> on the client side, there is no guarantee that your subprojects will
>>>> be sync'ed.  (This is the same as most peoples' client-side git
>>>> workflows today.)  *Even if we manage to atomically commit across
>>>> subprojects*, that is of limited utility unless those commits show up
>>>> atomically on developers' workstations.  But using a workflow based on
>>>> git-submodules is highly complex as compared to the monorepo -- this
>>>> was what I was trying to illustrate in my very first email on this
>>>> thread.
>>>> 
>>>> When we say "the monorepo gets you atomic commits," that's an abbreviation for
>>>> 
>>>> 1) The monorepo makes it far simpler to make atomic commits from git
>>>> as compared to the current SVN setup.
>>>> 2) Atomic commits are definitely possible in the monorepo.  They are
>>>> theoretically possible in the multirepo, with extensive tooling etc.
>>>> 3) Under the basic monorepo workflow, your checkouts are always
>>>> correct with respect to atomic commits.  Under the basic multirepo
>>>> workflow, this is not true -- you have to engage with git submodules
>>>> to get this property, and that is a giant pain.
>>>> 
>>>> Sorry for the wall of text, but this is important.
>>>> 
>>>> [1] https://github.com/jlebar/llvm-repo-tools.  Be careful, I've only
>>>> made one commit with it so far.  :)
>>>> 
>>>> On Sat, Jul 30, 2016 at 10:38 PM, Robinson, Paul <paul.robinson at sony.com> wrote:
>>>>>> The only thing a monorepo gets you that strictly isn’t possible without
>>>>>> it is the ability to commit to multiple projects in a single commit.
>>>>>> Personally I don’t think that is a big enough justification, but that is
>>>>>> my opinion, not a fact.
>>>>> 
>>>>> Okay, I just bumped into r277008, in which commits to llvm, clang, and
>>>>> clang-tools-extra all have the same SVN revision number.
>>>>> I don't know how it happened but it did.  Is this just an artifact of
>>>>> how somebody pasted together a bunch of git-svn projects, or is it
>>>>> something that a top-level git repo with submodules would allow?
>>>>> And if it is, then the "only thing a monorepo gets you" isn't something
>>>>> that you need a monorepo to get.
>>>>> Your befuddled correspondent,
>>>>> --paulr
>>>>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org <mailto:llvm-dev at lists.llvm.org>
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160731/d29b4f63/attachment-0001.html>


More information about the llvm-dev mailing list