[llvm-dev] [RFC] One or many git repositories?

Mon Jul 25 06:55:58 PDT 2016

> -----Original Message-----
> From: Robinson, Paul [mailto:paul.robinson at sony.com]
> Sent: 22 July 2016 18:50
> To: Renato Golin; Daniel Sanders
> Cc: llvm-dev at lists.llvm.org
> Subject: RE: [llvm-dev] [RFC] One or many git repositories?
> 
> > >> * public and downstream forks that *rely* on linear history
> > >
> > > Do you have an example in mind? I'd expect them to rely on each
> 'master'
> > being
> > > an improvement on 'master^'. I wouldn't expect them to be interested in
> > how
> > > 'master^' became 'master'.
> >
> > Paul Robinson was outlining some of the issues he had with git
> > history. I don't know their setup, so I'll let him describe the issues
> > (or he may have done so already in some thread, but I haven't read it
> > all).
> 
> Since you asked...
> 
> The key point is that a (basically) linear upstream history makes it
> feasible to do bisection on a downstream branch that mixes in a pile
> of local changes, because the (basically) linear upstream history can
> be merged into the downstream branch commit-by-commit which retains
> the crucial linearity property.
> 
> We have learned through experience that a bulk merge from upstream is
> a Bad Idea(tm).  Suppose we have a test that fails; it does not repro
> with an upstream compiler; we try to bisect it; we discover that it
> started after a bulk merge of 1000 commits from upstream.  But we can't
> bisect down the second-parent line of history, because that turns back
> into a straight upstream compiler and the problem fails to repro.
> 
> If instead we had rolled the 1000 commits into our repo individually,
> we'd have a linear history mixing upstream with our stuff and we would
> be able to bisect naturally.  But that relies on the *upstream* history
> being basically linear, because we can't pick apart an upstream commit
> that is itself a big merge of lots of commits. At least I don't know how.

I know of a way but it's not very nice. The gist of it is to checkout the
downstream branch just before the bad merge and then merge the first
100 commits from upstream. If the result is good then merge the next
100, but if it's bad then 'git reset --hard' and merge 10 instead. You'll
eventually find the commit that made it bad. Essentially, the idea is to
make a throwaway branch that merges more frequently. I do something
similar to rebase my work to master since gradually rebasing often
causes all the conflicts to go away.

> Now, I do say "basically" linear because the important thing is to have
> small increments of change each time.  It doesn't mean we have to have
> everything be ff-only, and we can surely tolerate the merge commits that
> wrap individual commits in a pull-request kind of workflow.  But merges
> that bring in long chains of commits are not what we want.
> --paulr

I agree that we should probably keep the history as close to linear as possible 
(mostly because I find the linux kernel's history difficult to follow) but it sounds
like the issue is more about the content of the merge than the linearity of
the history. A long-lived branch with a complex history sounds like it would be
ok in your scenario if the eventual merge was a small change to master.