[llvm-dev] [RFC] One or many git repositories?

Wed Aug 10 02:05:44 PDT 2016

> On 10 Aug 2016, at 00:32, Chris Bieneman via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> […] I say “might” because nobody has addressed my original concerns about whether or not that workflow would be dropped if we move to a PR based model, or how we would support something similar in a PR model. I also think that the mono-repo might discourage pull requests to the runtimes projects from users that don’t use clang, which concerns me. Either way Mehdi’s information isn’t going to get me to support your idea over the other proposal which offers me actual workflow improvements.

From any source of truth, there are ways to project to an unlimited number of other read-only views of subsets of the repository. Some of those views may be independent repos for each project, some may combine a subset of those independent projects together again using submodules, some may be a subset of projects in monorepo form (clang, llvm, lld for example).

One thought I had is that even if the projections are read-only it may still be possible to accept pull requests on any of them. It should be a relatively simple bot that scrapes those read-only pull requests, applies the path changes so they can be used on the writable “source of truth” (or splits it into multiple PRs if the “source of truth” is multiple repos and the PR is to some combined projection), opens new pull request(s) on that writable repository(ies), and closes the original PR with a reference to the one that could actually be applied.

I’m not attempting to argue that a complete monorepo with all projects is the ideal source of truth, but it does seem to me where revlock is important or where changes touching multiple projects are common then life is simplified by having the upstream be a monorepo containing those projects. With a PR model (similar to bisecting, branching, patching, CI integration, etc) it is much easier to apply cross-cutting changes by accepting a single PR rather than by specifying that this PR on clang depends on PR xyz on llvm being applied too.

The main debate here is what is the ultimate “source of truth”; which are the repos that are actually writable and have new commits added to them, and what other projections can be defined from these. Given projections are possible, the following question is how usable is it for downstream developers to make use of these projections? Consumption is obviously straightforward; using them for contributing is where most of the objections stem from and the best workflows here are still somewhat unclear (Mehdi’s svn bridge suggestion is certainly a reasonable workflow, though relying on svn for the commit does feel a little weird).

So I guess my main point is if a workflow for accepting contributions from read-only projections can be made simple then the ultimate upstream representation doesn’t matter all that much.

All sorts of structures are possible, with “all submodules” and “all monorepo” simply the two possible extremes.

Simon