[llvm-dev] [RFC] One or many git repositories?

Michael Gottesman via llvm-dev llvm-dev at lists.llvm.org
Sun Jul 31 00:24:26 PDT 2016

> On Jul 31, 2016, at 12:06 AM, Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> wrote:
>> And if it is, then the "only thing a monorepo gets you" isn't something that you need a monorepo to get.
> This is an *extremely important* point to understand, so let me try to
> be really clear about the current state of the world and the state of
> the world under the two "move to git" proposals.
> Today, all commits ultimately end up in SVN.  Our SVN is a effectively
> a monorepo, so today, a single commit can touch multiple subprojects.
> How you get the commit into SVN is your business.  Maybe you can hack
> git-svn somehow to do the atomic commit.  (If this is possible, it's
> beyond my ken.)  Alternatively you can just commit via SVN.  If you're
> a git user, I wrote a hacky script [1] that cherry-picks commits from
> the existing monorepo mirror and commits them via SVN.  It's annoying
> to do, but it is possible today to atomically commit to multiple
> subprojects, as you observed.
> Under the monorepo proposal, this becomes much easier.  It's just "git
> commit", no magic.
> Under the multirepo git proposal, this becomes either impossible or
> much more complicated.  Under the proposal, we have separate git
> repositories for each subproject, and we push directly to these.
> There's then an umbrella repository, which includes the subproject
> repos as git submodules.  There's a script which periodically checks
> the subproject repos for updates.  When it sees an update, it creates
> a new commit in the umbrella repository.  The script is the only thing
> that can create commits in the umbrella repo.
> In order to get atomic commits in the multirepo world, we would need
> some way to inform the script that two otherwise separate commits
> should appear in the umbrella repo as a single commit.  We'd probably
> need to agree on a protocol communicated via commit messages.  We'd
> also probably need client-side scripts to set the commit messages
> appropriately.

I have been thinking about this a little bit last night.

The natural way to synchronize a multi-commit update in a git repository is via a merge commit. This suggests what we really want in this case are several updates (one to each repository) on a branch that is then merged in one instant into the umbrella repository. Then the only thing the bots would see would be the merge commit and thus state is synchronized.

The natural way to do this would be via a multiple-repo PR. In such a case, the CI would handle the merging for you after you have done your testing and thus update the umbrella repo appropriately. In Swift land we are using PRs extensively and are going to most likely do multi-repo PRs. Once that is done I expect to implement what I just described so I have a nice repo to drive our performance tracking (which is using something else that is unfortunate right now).

I do not think it will be that complicated to implement.

> I expect this would be so much of a hassle, even if we managed to
> implement it on the server side, it would be prohibitively complex for
> most users.
> In addition, under the multirepo, you only get synchronized subproject
> commits in your local checkout if you choose to use a git-submodules
> based workflow.  If you use the workflow that we currently have, then
> on the client side, there is no guarantee that your subprojects will
> be sync'ed.  (This is the same as most peoples' client-side git
> workflows today.)  *Even if we manage to atomically commit across
> subprojects*, that is of limited utility unless those commits show up
> atomically on developers' workstations.  But using a workflow based on
> git-submodules is highly complex as compared to the monorepo -- this
> was what I was trying to illustrate in my very first email on this
> thread.
> When we say "the monorepo gets you atomic commits," that's an abbreviation for
> 1) The monorepo makes it far simpler to make atomic commits from git
> as compared to the current SVN setup.
> 2) Atomic commits are definitely possible in the monorepo.  They are
> theoretically possible in the multirepo, with extensive tooling etc.
> 3) Under the basic monorepo workflow, your checkouts are always
> correct with respect to atomic commits.  Under the basic multirepo
> workflow, this is not true -- you have to engage with git submodules
> to get this property, and that is a giant pain.
> Sorry for the wall of text, but this is important.
> [1] https://github.com/jlebar/llvm-repo-tools.  Be careful, I've only
> made one commit with it so far.  :)
> On Sat, Jul 30, 2016 at 10:38 PM, Robinson, Paul <paul.robinson at sony.com> wrote:
>>> The only thing a monorepo gets you that strictly isn’t possible without
>>> it is the ability to commit to multiple projects in a single commit.
>>> Personally I don’t think that is a big enough justification, but that is
>>> my opinion, not a fact.
>> Okay, I just bumped into r277008, in which commits to llvm, clang, and
>> clang-tools-extra all have the same SVN revision number.
>> I don't know how it happened but it did.  Is this just an artifact of
>> how somebody pasted together a bunch of git-svn projects, or is it
>> something that a top-level git repo with submodules would allow?
>> And if it is, then the "only thing a monorepo gets you" isn't something
>> that you need a monorepo to get.
>> Your befuddled correspondent,
>> --paulr
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

More information about the llvm-dev mailing list