[llvm-dev] [RFC] One or many git repositories?

Wed Jul 20 17:29:13 PDT 2016

> On 21 Jul 2016, at 09:39, Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Dear all,
> 
> I would like to (re-)open a discussion on the following specific question:
> 
>  Assuming we are moving the llvm project to git, should we
>  a) use multiple git repositories, linked together as subrepositories
> of an umbrella repo, or
>  b) use a single git repository for most llvm subprojects.
> 
> The current proposal assembled by Renato follows option (a), but I
> think option (b) will be significantly simpler and more effective.
> Moreover, I think the issues raised with option (b) are either
> incorrect or can be reasonably addressed.
> 

+1 to everything Justin points out here (and the rest of the email, which I've snipped for brevity).

Before anything else, I've been through a few of these conversions from SVN to git in other projects. In most of the ones I've seen going to submodules of multiple repo's, a lot of automation is required just to keep things manageable. That's hard to do on a cross-platform basis (do you script in Python, shell script, one per OS, etc.) and is really more trouble than it's worth -- especially when adding new submodules and/or removing them. They're not impossible to do, but they're also much more work than a single repo.

Just to point out some devil's advocate positions:

- Keeping the current structure will be less churn to existing consumers that have "out of tree" builds based on the current structure. Asking them to change their workflow with SVN significantly (since moving to GitHub is mostly swayed by the SVN interface) will probably be non-trivial amounts of work. We probably need to document this well enough or show that the switch won't affect them too badly.

- Some people value keeping the history of the commits in SVN and the Git counterpart once the move happens (for a lot of valid reasons). Making sure we can merge the histories of all the subproject repositories into a single one should be addressed to preserve "provenance".

- Some people like isolation of workflows and concerns. As a git-native convert, I'm not sold on this, but there's some good reasons to be able to do this (maintainers of certain projects will probably enforce different constraints on when/who/how changes can/should/must be made). Making it possible to do so in a monorepo should be explained well (i.e. does this need any special configs on the repo on the server side, on GitHub, etc.).

All in all I think optimising for the case of the everyday developer working on multiple projects (in my case LLVM, Clang, and compiler-rt, and maybe potentially XRay as a subproject too) is a good cause. Whether this translates to every special consumer of the current set-up is less clear at least to me -- so I'd like to know what other stakeholders here think.

Cheers