[llvm-dev] [RFC] One or many git repositories?

Mon Jul 25 07:20:59 PDT 2016

> -----Original Message-----
> From: Daniel Sanders [mailto:Daniel.Sanders at imgtec.com]
> Sent: Monday, July 25, 2016 6:56 AM
> To: Robinson, Paul; Renato Golin
> Cc: llvm-dev at lists.llvm.org
> Subject: RE: [llvm-dev] [RFC] One or many git repositories?
> 
> 
> > -----Original Message-----
> > From: Robinson, Paul [mailto:paul.robinson at sony.com]
> > Sent: 22 July 2016 18:50
> > To: Renato Golin; Daniel Sanders
> > Cc: llvm-dev at lists.llvm.org
> > Subject: RE: [llvm-dev] [RFC] One or many git repositories?
> >
> > > >> * public and downstream forks that *rely* on linear history
> > > >
> > > > Do you have an example in mind? I'd expect them to rely on each
> > 'master'
> > > being
> > > > an improvement on 'master^'. I wouldn't expect them to be interested
> in
> > > how
> > > > 'master^' became 'master'.
> > >
> > > Paul Robinson was outlining some of the issues he had with git
> > > history. I don't know their setup, so I'll let him describe the issues
> > > (or he may have done so already in some thread, but I haven't read it
> > > all).
> >
> > Since you asked...
> >
> > The key point is that a (basically) linear upstream history makes it
> > feasible to do bisection on a downstream branch that mixes in a pile
> > of local changes, because the (basically) linear upstream history can
> > be merged into the downstream branch commit-by-commit which retains
> > the crucial linearity property.
> >
> > We have learned through experience that a bulk merge from upstream is
> > a Bad Idea(tm).  Suppose we have a test that fails; it does not repro
> > with an upstream compiler; we try to bisect it; we discover that it
> > started after a bulk merge of 1000 commits from upstream.  But we can't
> > bisect down the second-parent line of history, because that turns back
> > into a straight upstream compiler and the problem fails to repro.
> >
> > If instead we had rolled the 1000 commits into our repo individually,
> > we'd have a linear history mixing upstream with our stuff and we would
> > be able to bisect naturally.  But that relies on the *upstream* history
> > being basically linear, because we can't pick apart an upstream commit
> > that is itself a big merge of lots of commits. At least I don't know
> how.
> 
> I know of a way but it's not very nice. The gist of it is to checkout the
> downstream branch just before the bad merge and then merge the first
> 100 commits from upstream. If the result is good then merge the next
> 100, but if it's bad then 'git reset --hard' and merge 10 instead. You'll
> eventually find the commit that made it bad. Essentially, the idea is to
> make a throwaway branch that merges more frequently. I do something
> similar to rebase my work to master since gradually rebasing often
> causes all the conflicts to go away.

A very manual sort of bisection, but yeah that would get the job done.

> 
> > Now, I do say "basically" linear because the important thing is to have
> > small increments of change each time.  It doesn't mean we have to have
> > everything be ff-only, and we can surely tolerate the merge commits that
> > wrap individual commits in a pull-request kind of workflow.  But merges
> > that bring in long chains of commits are not what we want.
> > --paulr
> 
> I agree that we should probably keep the history as close to linear as
> possible
> (mostly because I find the linux kernel's history difficult to follow) but
> it sounds
> like the issue is more about the content of the merge than the linearity
> of
> the history. A long-lived branch with a complex history sounds like it
> would be
> ok in your scenario if the eventual merge was a small change to master.

I think I'd rather see such things squashed before they reach master,
because a normal bisection might still be tempted down the garden path
of the second-parent history.
--paulr