[llvm-dev] [RFC] One or many git repositories?
Duncan P. N. Exon Smith via llvm-dev
llvm-dev at lists.llvm.org
Mon Jul 25 15:04:19 PDT 2016
A couple of points that I haven't seen raised yet (I'm mid-vacation so this is pretty-much a fly-by; sorry if I missed these earlier in the thread).
I haven't thought about this as deeply as the rest of you. Maybe these are easily refuted?
1. If there's a move toward a monolithic repository, it's important to remember this is the LLVM project, not the Clang project. A nested layout that optimizes for Clang developers at the expense of everyone else's workflow would be a disservice to the greater community, even if it's temporary "while we figure out what really makes sense" kind of state. For that reason, I'm against a monolithic layout that has "clang" living at "tools/clang"... or, really, having any sub-project live inside of the "llvm" directory.
2. Those working on projects *outside* the monolithic repo will get the downsides of both: a monolithic repo that they are only using parts of, and multiple repos that are somehow version-locked.
3. For many (most?) developers, changing to a monolithic git repo is a *bigger* workflow change than switching to separate git repos. Many people (and at least some downstream infrastructure) use the git mirrors exclusively, aside from git-svn for committing.
#1 and #2 don't negatively impact Clang developers really -- and we have the loudest voices -- but we should be intentional about any changes here.
I mention #3 specifically to address Richard's claim that a monolithic, nested, git repo is a smaller change than separate git repos. On the contrary, with separate git repos, I just need to update a couple of remotes and I'm finished.
- If "minimize incremental change" is important, we should start with separate git repos (since only SVN users need to change their workflow).
- If "minimize number of changes" is important, we should figure out a close approximation of the end goal and move directly there.
Two specific replies below.
> On 2016-Jul-25, at 12:54, Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> wrote:
> Hi, all.
> I feel like we've strayed pretty far from the question originally
> posed in this thread.
> One of the pieces of feedback I got before I started this thread was
> that many people felt that, the last time the question of multiple
> repos vs. monorepo was discussed, it was interspersed with other
> topics, making it difficult for some people to weigh in appropriately
> (or even to be aware that the discussion was occurring). I'm afraid
> that the discussion of github workflows we're having here may cause
> the same problem.
> Maybe we can move the discussion about github workflows into a
> different thread? Again, I don't mean to stop it, just move it.
> To re-focus this thread on its original topic: It sounds to me like,
> broadly speaking, we have consensus on using a single repository.
I'm not convinced. I'd be interested in hearing via the survey which path (separate repos vs. monolithic) causes the most workflow disruption.
> there are still some outstanding related questions. Among these are:
> 1) Should the repository have "unified history"? (Meaning, should I
> be able to check out a single git revision from before the migration
> and have it contain all of the llvm subprojects?)
> 2) Should the monorepo have a "nested" repository layout (e.g. clang
> goes in /tools/clang) or a "flat" layout (clang goes in /clang)?
> 3) Assuming we want unified history, should the new canonical
> repository's hashes be based on
> https://github.com/llvm-project/llvm-project, or should it start
> FWIW my answers to these are:
> 1) Yes to unified history. The main advantage of non-unified history
> is that it's easier for people to import old branches -- it's a matter
> of "git merge" instead of running the git filter-branch script I
> wrote. But this is a relatively small (~20 minute) one-time cost to
> some of us, whereas our repository history is born by all of us
> forever. Moreover unified history also helps people with long-running
> branches, as it lets them check out old versions of their branch and
> get a compatible version of all of the other llvm subprojects.
> 2) Yes to nested layout. I find Chandler and Richard Smith's
> arguments compelling.
I disagree with having "clang" nested inside "llvm".
> 3) No to basing the new canonical repo on
> https://github.com/llvm-project/llvm-project. That repo's history is
> missing svn revision numbers, and there are enough emails floating
> around that reference svn revision numbers that I think we need them
> in our canonical repo. Also llvm-project/llvm-project has a flat
> structure, and if we end up going with a nested layout, it would be
> better to have that layout starting with the first commit.
> On Mon, Jul 25, 2016 at 8:10 AM, Bruce Hoult via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> git-imerge can run an arbitrary script to decide whether a commit is good or
>> bad. Lack of textual merge conflicts is only the most basic test -- you can
>> check that it compiles, run tests .. whatever you want and have time to
>> On Tue, Jul 26, 2016 at 2:12 AM, Robinson, Paul via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>>> -----Original Message-----
>>>> From: Renato Golin [mailto:renato.golin at linaro.org]
>>>> Sent: Monday, July 25, 2016 7:11 AM
>>>> To: Daniel Sanders
>>>> Cc: Robinson, Paul; llvm-dev at lists.llvm.org
>>>> Subject: Re: [llvm-dev] [RFC] One or many git repositories?
>>>> On 25 July 2016 at 14:55, Daniel Sanders <Daniel.Sanders at imgtec.com>
>>>>> I know of a way but it's not very nice. The gist of it is to checkout
>>>>> downstream branch just before the bad merge and then merge the first
>>>>> 100 commits from upstream. If the result is good then merge the next
>>>>> 100, but if it's bad then 'git reset --hard' and merge 10 instead.
>>>>> eventually find the commit that made it bad. Essentially, the idea is
>>>>> make a throwaway branch that merges more frequently. I do something
>>>>> similar to rebase my work to master since gradually rebasing often
>>>>> causes all the conflicts to go away.
>>>> This is essentially what git-imerge does, you only need to define
>>>> "good merge" in the form of a script or CI job.
>>> Except I understood git-imerge to be looking for physical conflicts,
>>> not "when did this test start failing." If it does the latter also,
>>> that would be awesome.
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
More information about the llvm-dev