[llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Chandler Carruth via llvm-dev llvm-dev at lists.llvm.org
Tue Nov 6 11:26:13 PST 2018


On Tue, Nov 6, 2018 at 7:29 AM Joseph Tremoulet via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Would it help to include the llvm-mirror commit hashes in the new commit
> messages?  Currently the conversion injects “llvm-svn:<revision>” into the
> commit messages; it seems like with a bit of work it could also inject
> “llvm-mirror/<subproject>:<hash>”, couldn’t it?  I realize that’s still
> just providing a way to manually look up new hashes from old hashes, and
> Justin has explained why that’s still painful, but I’m wondering if it
> would be more palatable with the mapping a bit more discoverable that way,
> and consistent across different projects in this boat, and with no more
> tooling required than `git log --grep <old_hash>` to look up a new hash…
>

FWIW, I think adding the "official" git mirror hashes along side the svn
revision to the commit messages would be great if it can be implemented
reasonably easily.


>
>
> *From:* llvm-dev <llvm-dev-bounces at lists.llvm.org> *On Behalf Of *James Y
> Knight via llvm-dev
> *Sent:* Friday, November 2, 2018 5:56 PM
> *To:* Justin Bogner <mail at justinbogner.com>
> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>
>
>
> *Subject:* Re: [llvm-dev] RFC: Dealing with out of tree changes and the
> LLVM git monorepo
>
>
>
>
>
> On Fri, Nov 2, 2018 at 2:11 PM Justin Bogner <mail at justinbogner.com>
> wrote:
>
> James Y Knight <jyknight at google.com> writes:
> > Thanks for writing this up. I think it's a really important point which
> > deserves discussion.
> >
> > Ultimately, I think it is a question as to whether to prioritize the easy
> > switchover for existing out of tree forks, or to prioritize having the
> best
> > conversion we can make. I feel very strongly that the latter should be
> the
> > priority for the official repository conversion, and that, therefore, we
> > should not use the zipper method for the official repository going
> forward.
>
> How do you define "best conversion" here? I may be missing something,
> but I really don't see any actual advantage to re-writing the git
> history from scratch rather than leveraging the existing git mirrors to
> build a monorepo.
>
> The re-generated history approach gives us an artificial alternate
> history where we developed in a git monorepo from the beginning of time.
>
>
>
> I note that "we", where "we" = llvm upstream developers, *have* been
> developing in a monorepo -- an SVN monorepo, with a linear history. The
> llvm-git-prototype repository better matches this actual development
> history.
>
>
>
> It throws away a bunch of information for the sake of making a
> "pristine" conversion with fewer branches, even though those branches
> have almost zero cost.
>
>
>
> You mean "merge commits" here, not "branches", I believe, since your
> repository has a single branch, "master".
>
>
>
> The zipper approach gives us the best of both worlds - it provides a
> monorepo view for all time for anyone who wants it, but also preserves
> the history that people have been using and relying on for a number of
> years.
>
>
>
>
>
> I'd like to hear what you think are the actual disadvantages of the
> zipper approach. I've spoken to quite a few people about it in the last
> few days and I haven't really found any yet.
>
>
>
> The downside, generally, is that it makes the history _very_ complex.
> Which is not necessarily bad, in of it self, but it's not really
> representative of the development history of llvm, and it turns out that it
> causes problems.
>
>
>
> Two concrete disadvantages have been mentioned on this thread, already:
>
> 1. gitk cannot be used -- it just falls over when given the history with
> 300000 merges.
>
> 2. git bisect becomes somewhat trickier.
>
>
>
> I'd add a couple more to that:
>
>
>
> 3. "git log -u llvm/" no longer works (for any file/path), because the
> commits which *actually* changed the files don't occur at that path, and
> the default is to omit diffs arising in a merge commit. (The actual content
> change happened at a different path -- the root of the tree, not under
> "llvm/", and is just moved under llvm/ in the merge commit.)
>
>
>
> You can work around this, via "git log -u -m --first-parent llvm/" to get
> the diffs from the merge commit itself. But this is a large annoyance --
> looking at path histories is a very common task. Making matters worse right
> now is that the zipper merge-commit doesn't have the full commit message,
> only the first line. That, at least, can be fixed.
>
>
>
> 4. Other commands like "git log -u -S CFGStackify" become trickier -- it
> returns the individual project commit, not the merge commit, and gives the
> "wrong" pathnames (without the subproject prefix). So every time you look
> at one of these, you need to map it back yourself.
>
>
>
> Those are just the first things I tried -- I think there will be *more*
> variants of these sorts of issues which will show up with further attempts
> to use a repository built in this style. Certainly none of this is FATAL
> problems, but will be a constant irritation.
>
>
>
> Consider a world where I convert all of my branches as-if they were
> based on the monorepo. Now, something comes up and I need to hot fix
> last year's branch. I probably can't actually submit this fix from the
> monorepo, since it would be too disruptive to also hot fix the
> configuration changes to submit from a new layout of repositories. Now I
> need to maintain two copies of my code and manage merging between them
>
>
>
> I can't speak to your exact problem, not knowing anything about your
> infrastructure or workflows. But, if you were to want to keep using your
> old separated repositories for your old branches, and to switch only master
> and future branches over to the new monorepo, "git format-patch" and "git
> apply" do make copying commits or stacks of commits between completely
> different repositories (even between split and mono repositories)
> relatively straightforward.
>
>
>
>
>
> [...]
>
>
>
> So, we publish two competing versions of the git history and let people
> choose? This sounds like a splitting the baby type solution to me ;)
>
>
>
> The point here is not to offer "competing" versions -- the
> llvm-git-prototype monorepo (with the linear history) would be the official
> repository, recommended by default for everyone.
>
>
>
> The technique outlined above simply offers a solution for those who may
> desire to have their historical commits appear in the zipper-merge fashion,
> to do so up to the point in history where they last pulled from the split
> git repositories into their private forks, and to then switch to the
> official monorepo afterward.
>
>
>
> I'd still recommend avoiding that, generally, because of the issues caused
> by the more complex structure of the zipper-repository. But if that's the
> path that is best for your repository and infrastructure, I believe it's
> feasible to do so without needing to impact others.
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181106/68b53a97/attachment.html>


More information about the llvm-dev mailing list