[llvm-dev] [RFC] One or many git repositories?
Michael Gottesman via llvm-dev
llvm-dev at lists.llvm.org
Sun Jul 31 00:56:27 PDT 2016
> On Jul 31, 2016, at 12:36 AM, Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>
>> This suggests what we really want in this case are several updates (one to each repository) on a branch that is then merged in one instant into the umbrella repository. Then the only thing the bots would see would be the merge commit and thus state is synchronized.
>
> The script that updates the umbrella repo would need to know that the
> several updates should all go into one branch that is then merged in
> one instant into the umbrella repo's master. At the point that the
> umbrella repo can know this, it might as well just make a single
> commit to master that updates all N subproject hashes at once (which
> is what I was suggesting) -- I don't see how having a branch makes the
> situation any less complicated.
Ok. Sure. I was thinking out loud.
>
>> The natural way to do this would be via a multiple-repo PR.
>
> That doesn't exist in github, right? We would have to somehow create
> this multi-repo PR-management system and link it in with the script
> that is managing the umbrella repo? That is what I was describing.
The script in this case would not be making the change. The change would be made by a special trusted continuous integration bot that does the merging. In Swift land we have been using our @swift-ci system to merge things into master after testing with great success. In terms of the script, the script would see that it couldn't push a change and then would then just restart the loop.
>
> Again, I am not claiming this isn't possible. And, I don't care a ton
> about complexity on the server-side. But I do care about complexity
> on the client side. I think it's highly unlikely that there exists a
> system for creating atomic commits and then checking out code in a way
> that respects that atomicity that as simple as "git commit" and "git
> checkout" (which is what we'd have on the monorepo).
The key assumption here is that LLVM will not switch to a heavy PR model (something which from what I understand is not a part of this specific discussion and will be considered strictly after the move to github). In such a case, I believe that it will be relatively simple to communicate this to the CI and have the CI manage it for you. If on the swift side we implement such a thing, I would be more than happy to provide guidance to you to help setup such a system reusing the work on the Swift side.
Another thing that I do not understand about the mono-repo proposal is that (*note* correct me if I am wrong) is that we can only avoid external synchronization if we get /all/ projects that have build dependencies on the mono-repo into the mono-repo. This suggests that (unless we are saying that synchronization of those repositories are not important), that we will need to invest in some sort of synchronization regardless of the mono-repo proposal. In such a case the mono-repo proposal is essentially just an attempt to make it convenient for a large subset of the community to ease their workflows, rather than truly being an alternative to the submodule proposal. Am I misunderstanding?
Michael
>
> On Sun, Jul 31, 2016 at 12:25 AM, Justin Lebar <jlebar at google.com> wrote:
>> By the way, I've been using the existing read-only monorepo [1] for a
>> few days now. The intent is to commit via the script I put together
>> [2], although I haven't committed anything other than a testing commit
>> [3].
>>
>> All I can say is, *wow* is it nice. I hid everything I don't care
>> about using a sparse checkout [4]. Many of my tools (e.g. ctrl-p [5]
>> [6], ycm [7]) suddenly work better now that there isn't an artificial
>> boundary between my clang and llvm repositories. I can have patch
>> queues that include LLVM commits and clang commits arbitrarily
>> interspersed with one another -- something I didn't realize I wanted
>> until I made the switch and noticed I already had branches I could
>> merge (and something we can't do with Bogner's suggested multirepo
>> workflow).
>>
>> [1] https://github.com/llvm-project/llvm-project
>> [2] https://github.com/jlebar/llvm-repo-tools
>> [3] https://github.com/llvm-project/llvm-project/commit/38a6db646d8f43cd9d7cec6c0533e40946cd162f
>> (which, embarrassingly, has a typo in the commit message)
>> [4] http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
>> [5] https://github.com/kien/ctrlp.vim
>> [6] https://github.com/jlebar/ctrlp-py-matcher
>> [7] https://github.com/Valloric/YouCompleteMe
>>
>> On Sun, Jul 31, 2016 at 12:06 AM, Justin Lebar <jlebar at google.com> wrote:
>>>> And if it is, then the "only thing a monorepo gets you" isn't something that you need a monorepo to get.
>>>
>>> This is an *extremely important* point to understand, so let me try to
>>> be really clear about the current state of the world and the state of
>>> the world under the two "move to git" proposals.
>>>
>>> Today, all commits ultimately end up in SVN. Our SVN is a effectively
>>> a monorepo, so today, a single commit can touch multiple subprojects.
>>> How you get the commit into SVN is your business. Maybe you can hack
>>> git-svn somehow to do the atomic commit. (If this is possible, it's
>>> beyond my ken.) Alternatively you can just commit via SVN. If you're
>>> a git user, I wrote a hacky script [1] that cherry-picks commits from
>>> the existing monorepo mirror and commits them via SVN. It's annoying
>>> to do, but it is possible today to atomically commit to multiple
>>> subprojects, as you observed.
>>>
>>> Under the monorepo proposal, this becomes much easier. It's just "git
>>> commit", no magic.
>>>
>>> Under the multirepo git proposal, this becomes either impossible or
>>> much more complicated. Under the proposal, we have separate git
>>> repositories for each subproject, and we push directly to these.
>>> There's then an umbrella repository, which includes the subproject
>>> repos as git submodules. There's a script which periodically checks
>>> the subproject repos for updates. When it sees an update, it creates
>>> a new commit in the umbrella repository. The script is the only thing
>>> that can create commits in the umbrella repo.
>>>
>>> In order to get atomic commits in the multirepo world, we would need
>>> some way to inform the script that two otherwise separate commits
>>> should appear in the umbrella repo as a single commit. We'd probably
>>> need to agree on a protocol communicated via commit messages. We'd
>>> also probably need client-side scripts to set the commit messages
>>> appropriately.
>>>
>>> I expect this would be so much of a hassle, even if we managed to
>>> implement it on the server side, it would be prohibitively complex for
>>> most users.
>>>
>>> In addition, under the multirepo, you only get synchronized subproject
>>> commits in your local checkout if you choose to use a git-submodules
>>> based workflow. If you use the workflow that we currently have, then
>>> on the client side, there is no guarantee that your subprojects will
>>> be sync'ed. (This is the same as most peoples' client-side git
>>> workflows today.) *Even if we manage to atomically commit across
>>> subprojects*, that is of limited utility unless those commits show up
>>> atomically on developers' workstations. But using a workflow based on
>>> git-submodules is highly complex as compared to the monorepo -- this
>>> was what I was trying to illustrate in my very first email on this
>>> thread.
>>>
>>> When we say "the monorepo gets you atomic commits," that's an abbreviation for
>>>
>>> 1) The monorepo makes it far simpler to make atomic commits from git
>>> as compared to the current SVN setup.
>>> 2) Atomic commits are definitely possible in the monorepo. They are
>>> theoretically possible in the multirepo, with extensive tooling etc.
>>> 3) Under the basic monorepo workflow, your checkouts are always
>>> correct with respect to atomic commits. Under the basic multirepo
>>> workflow, this is not true -- you have to engage with git submodules
>>> to get this property, and that is a giant pain.
>>>
>>> Sorry for the wall of text, but this is important.
>>>
>>> [1] https://github.com/jlebar/llvm-repo-tools. Be careful, I've only
>>> made one commit with it so far. :)
>>>
>>> On Sat, Jul 30, 2016 at 10:38 PM, Robinson, Paul <paul.robinson at sony.com> wrote:
>>>>> The only thing a monorepo gets you that strictly isn’t possible without
>>>>> it is the ability to commit to multiple projects in a single commit.
>>>>> Personally I don’t think that is a big enough justification, but that is
>>>>> my opinion, not a fact.
>>>>
>>>> Okay, I just bumped into r277008, in which commits to llvm, clang, and
>>>> clang-tools-extra all have the same SVN revision number.
>>>> I don't know how it happened but it did. Is this just an artifact of
>>>> how somebody pasted together a bunch of git-svn projects, or is it
>>>> something that a top-level git repo with submodules would allow?
>>>> And if it is, then the "only thing a monorepo gets you" isn't something
>>>> that you need a monorepo to get.
>>>> Your befuddled correspondent,
>>>> --paulr
>>>>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
More information about the llvm-dev
mailing list