[llvm-dev] [RFC] One or many git repositories?

Mon Jul 25 15:04:19 PDT 2016

A couple of points that I haven't seen raised yet (I'm mid-vacation so this is pretty-much a fly-by; sorry if I missed these earlier in the thread).

I haven't thought about this as deeply as the rest of you.  Maybe these are easily refuted?

1. If there's a move toward a monolithic repository, it's important to remember this is the LLVM project, not the Clang project.  A nested layout that optimizes for Clang developers at the expense of everyone else's workflow would be a disservice to the greater community, even if it's temporary "while we figure out what really makes sense" kind of state.  For that reason, I'm against a monolithic layout that has "clang" living at "tools/clang"... or, really, having any sub-project live inside of the "llvm" directory.

2. Those working on projects *outside* the monolithic repo will get the downsides of both: a monolithic repo that they are only using parts of, and multiple repos that are somehow version-locked.

3. For many (most?) developers, changing to a monolithic git repo is a *bigger* workflow change than switching to separate git repos.  Many people (and at least some downstream infrastructure) use the git mirrors exclusively, aside from git-svn for committing.

#1 and #2 don't negatively impact Clang developers really -- and we have the loudest voices -- but we should be intentional about any changes here.

I mention #3 specifically to address Richard's claim that a monolithic, nested, git repo is a smaller change than separate git repos.  On the contrary, with separate git repos, I just need to update a couple of remotes and I'm finished.
- If "minimize incremental change" is important, we should start with separate git repos (since only SVN users need to change their workflow).
- If "minimize number of changes" is important, we should figure out a close approximation of the end goal and move directly there.

Two specific replies below.

> On 2016-Jul-25, at 12:54, Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> 
> Hi, all.
> 
> I feel like we've strayed pretty far from the question originally
> posed in this thread.
> 
> One of the pieces of feedback I got before I started this thread was
> that many people felt that, the last time the question of multiple
> repos vs. monorepo was discussed, it was interspersed with other
> topics, making it difficult for some people to weigh in appropriately
> (or even to be aware that the discussion was occurring).  I'm afraid
> that the discussion of github workflows we're having here may cause
> the same problem.
> 
> Maybe we can move the discussion about github workflows into a
> different thread?  Again, I don't mean to stop it, just move it.
> 
> To re-focus this thread on its original topic: It sounds to me like,
> broadly speaking, we have consensus on using a single repository.

I'm not convinced.  I'd be interested in hearing via the survey which path (separate repos vs. monolithic) causes the most workflow disruption.

> But
> there are still some outstanding related questions.  Among these are:
> 
> 1) Should the repository have "unified history"?  (Meaning, should I
> be able to check out a single git revision from before the migration
> and have it contain all of the llvm subprojects?)
> 
> 2) Should the monorepo have a "nested" repository layout (e.g. clang
> goes in /tools/clang) or a "flat" layout (clang goes in /clang)?
> 
> 3) Assuming we want unified history, should the new canonical
> repository's hashes be based on
> https://github.com/llvm-project/llvm-project, or should it start
> afresh?
> 
> FWIW my answers to these are:
> 
> 1) Yes to unified history.  The main advantage of non-unified history
> is that it's easier for people to import old branches -- it's a matter
> of "git merge" instead of running the git filter-branch script I
> wrote.  But this is a relatively small (~20 minute) one-time cost to
> some of us, whereas our repository history is born by all of us
> forever.  Moreover unified history also helps people with long-running
> branches, as it lets them check out old versions of their branch and
> get a compatible version of all of the other llvm subprojects.
> 
> 2) Yes to nested layout.  I find Chandler and Richard Smith's
> arguments compelling.

I disagree with having "clang" nested inside "llvm".

> 3) No to basing the new canonical repo on
> https://github.com/llvm-project/llvm-project.  That repo's history is
> missing svn revision numbers, and there are enough emails floating
> around that reference svn revision numbers that I think we need them
> in our canonical repo.  Also llvm-project/llvm-project has a flat
> structure, and if we end up going with a nested layout, it would be
> better to have that layout starting with the first commit.
> 
> -Justin
> 
> On Mon, Jul 25, 2016 at 8:10 AM, Bruce Hoult via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> git-imerge can run an arbitrary script to decide whether a commit is good or
>> bad. Lack of textual merge conflicts is only the most basic test -- you can
>> check that it compiles, run tests .. whatever you want and have time to
>> execute.
>> 
>> On Tue, Jul 26, 2016 at 2:12 AM, Robinson, Paul via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>> 
>>> 
>>> 
>>>> -----Original Message-----
>>>> From: Renato Golin [mailto:renato.golin at linaro.org]
>>>> Sent: Monday, July 25, 2016 7:11 AM
>>>> To: Daniel Sanders
>>>> Cc: Robinson, Paul; llvm-dev at lists.llvm.org
>>>> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>>>> 
>>>> On 25 July 2016 at 14:55, Daniel Sanders <Daniel.Sanders at imgtec.com>
>>>> wrote:
>>>>> I know of a way but it's not very nice. The gist of it is to checkout
>>>> the
>>>>> downstream branch just before the bad merge and then merge the first
>>>>> 100 commits from upstream. If the result is good then merge the next
>>>>> 100, but if it's bad then 'git reset --hard' and merge 10 instead.
>>>> You'll
>>>>> eventually find the commit that made it bad. Essentially, the idea is
>>>>> to
>>>>> make a throwaway branch that merges more frequently. I do something
>>>>> similar to rebase my work to master since gradually rebasing often
>>>>> causes all the conflicts to go away.
>>>> 
>>>> This is essentially what git-imerge does, you only need to define
>>>> "good merge" in the form of a script or CI job.
>>>> 
>>>> cheers,
>>>> -renato
>>> 
>>> Except I understood git-imerge to be looking for physical conflicts,
>>> not "when did this test start failing."  If it does the latter also,
>>> that would be awesome.
>>> --paulr
>>> 
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev