[llvm-dev] [RFC] One or many git repositories?

Sun Jul 31 00:25:11 PDT 2016

By the way, I've been using the existing read-only monorepo [1] for a
few days now.  The intent is to commit via the script I put together
[2], although I haven't committed anything other than a testing commit
[3].

All I can say is, *wow* is it nice.  I hid everything I don't care
about using a sparse checkout [4].  Many of my tools (e.g. ctrl-p [5]
[6], ycm [7]) suddenly work better now that there isn't an artificial
boundary between my clang and llvm repositories.  I can have patch
queues that include LLVM commits and clang commits arbitrarily
interspersed with one another -- something I didn't realize I wanted
until I made the switch and noticed I already had branches I could
merge (and something we can't do with Bogner's suggested multirepo
workflow).

[1] https://github.com/llvm-project/llvm-project
[2] https://github.com/jlebar/llvm-repo-tools
[3] https://github.com/llvm-project/llvm-project/commit/38a6db646d8f43cd9d7cec6c0533e40946cd162f
(which, embarrassingly, has a typo in the commit message)
[4] http://jasonkarns.com/blog/subdirectory-checkouts-with-git-sparse-checkout/
[5] https://github.com/kien/ctrlp.vim
[6] https://github.com/jlebar/ctrlp-py-matcher
[7] https://github.com/Valloric/YouCompleteMe

On Sun, Jul 31, 2016 at 12:06 AM, Justin Lebar <jlebar at google.com> wrote:
>> And if it is, then the "only thing a monorepo gets you" isn't something that you need a monorepo to get.
>
> This is an *extremely important* point to understand, so let me try to
> be really clear about the current state of the world and the state of
> the world under the two "move to git" proposals.
>
> Today, all commits ultimately end up in SVN.  Our SVN is a effectively
> a monorepo, so today, a single commit can touch multiple subprojects.
> How you get the commit into SVN is your business.  Maybe you can hack
> git-svn somehow to do the atomic commit.  (If this is possible, it's
> beyond my ken.)  Alternatively you can just commit via SVN.  If you're
> a git user, I wrote a hacky script [1] that cherry-picks commits from
> the existing monorepo mirror and commits them via SVN.  It's annoying
> to do, but it is possible today to atomically commit to multiple
> subprojects, as you observed.
>
> Under the monorepo proposal, this becomes much easier.  It's just "git
> commit", no magic.
>
> Under the multirepo git proposal, this becomes either impossible or
> much more complicated.  Under the proposal, we have separate git
> repositories for each subproject, and we push directly to these.
> There's then an umbrella repository, which includes the subproject
> repos as git submodules.  There's a script which periodically checks
> the subproject repos for updates.  When it sees an update, it creates
> a new commit in the umbrella repository.  The script is the only thing
> that can create commits in the umbrella repo.
>
> In order to get atomic commits in the multirepo world, we would need
> some way to inform the script that two otherwise separate commits
> should appear in the umbrella repo as a single commit.  We'd probably
> need to agree on a protocol communicated via commit messages.  We'd
> also probably need client-side scripts to set the commit messages
> appropriately.
>
> I expect this would be so much of a hassle, even if we managed to
> implement it on the server side, it would be prohibitively complex for
> most users.
>
> In addition, under the multirepo, you only get synchronized subproject
> commits in your local checkout if you choose to use a git-submodules
> based workflow.  If you use the workflow that we currently have, then
> on the client side, there is no guarantee that your subprojects will
> be sync'ed.  (This is the same as most peoples' client-side git
> workflows today.)  *Even if we manage to atomically commit across
> subprojects*, that is of limited utility unless those commits show up
> atomically on developers' workstations.  But using a workflow based on
> git-submodules is highly complex as compared to the monorepo -- this
> was what I was trying to illustrate in my very first email on this
> thread.
>
> When we say "the monorepo gets you atomic commits," that's an abbreviation for
>
> 1) The monorepo makes it far simpler to make atomic commits from git
> as compared to the current SVN setup.
> 2) Atomic commits are definitely possible in the monorepo.  They are
> theoretically possible in the multirepo, with extensive tooling etc.
> 3) Under the basic monorepo workflow, your checkouts are always
> correct with respect to atomic commits.  Under the basic multirepo
> workflow, this is not true -- you have to engage with git submodules
> to get this property, and that is a giant pain.
>
> Sorry for the wall of text, but this is important.
>
> [1] https://github.com/jlebar/llvm-repo-tools.  Be careful, I've only
> made one commit with it so far.  :)
>
> On Sat, Jul 30, 2016 at 10:38 PM, Robinson, Paul <paul.robinson at sony.com> wrote:
>>> The only thing a monorepo gets you that strictly isn’t possible without
>>> it is the ability to commit to multiple projects in a single commit.
>>> Personally I don’t think that is a big enough justification, but that is
>>> my opinion, not a fact.
>>
>> Okay, I just bumped into r277008, in which commits to llvm, clang, and
>> clang-tools-extra all have the same SVN revision number.
>> I don't know how it happened but it did.  Is this just an artifact of
>> how somebody pasted together a bunch of git-svn projects, or is it
>> something that a top-level git repo with submodules would allow?
>> And if it is, then the "only thing a monorepo gets you" isn't something
>> that you need a monorepo to get.
>> Your befuddled correspondent,
>> --paulr
>>