<div dir="ltr">I'm going to try to stay out of the question of whether or not we should do it this way. (We'll see if I succeed. :)<div><br></div><div>But if we do decide to do it this way, it would be nice if we'd do an N-way merge when there's a single SVN commit that affects multiple git repos.</div></div><br><div class="gmail_quote"><div dir="ltr">On Wed, Oct 31, 2018 at 9:22 AM Justin Bogner via llvm-dev <<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>
<br>
I've spent some time in the last couple of days trying to figure out how<br>
to adopt the [LLVM git monorepo prototype] for an out of tree backend.<br>
TLDR: I'm not convinced that this prototype is the right approach to<br>
converting to the monorepo, and I have a possible alternative.<br>
<br>
The main problems I'm running into stem from the fact that this<br>
prototype rewrites all of history from scratch rather than leverage the<br>
existing [official git mirrors]. This makes migrating out-of-tree work<br>
from the official git mirrors to this repo very difficult, since there<br>
is no shared history. Some efforts have gone into [documenting how to<br>
port in-progress patches], but this doesn't attempt to discuss how to<br>
handle more substantial out of tree work.<br>
<br>
Issues with integrating the prototype<br>
-------------------------------------<br>
<br>
As far as I can tell, my options for trying to integrate with this<br>
monorepo are fairly limited.<br>
<br>
If I merge my trees directly into the monorepo prototype at head, I end<br>
up with two copies of every commit, one of which is a monorepo style<br>
commit and one with the singular repo history. These commits are<br>
completely unrelated to each other, and exist in two separate parallel<br>
histories, making it difficult to correlate one to the other or even to<br>
tell which is which.<br>
<br>
An arguably cleaner solution would be try to recreate all of my trees'<br>
history artificially as if they were based on the monorepo prototype<br>
history all along, but this has two problems. First, it's a very<br>
significant tooling effort to do this - I'd need to match up several<br>
years of merge points to their corresponding spots in the monorepo<br>
prototype and somehow redo all of the merges in the same ways. Tools<br>
like "rebase --preserve-merges" don't really help here, since they abort<br>
on merge conflicts and ask a human to resolve them again. Even if I were<br>
to come up with tooling that managed this, I'm still left with a<br>
completely new set of hashes for commits and no easy way to map them to<br>
existing references in emails, bug trackers, and release notes.<br>
<br>
Finally, there's the option of throwing away all of my history and<br>
applying my out of tree work in a single patch. This makes git-log and<br>
git-blame useless for investigating issues in my codebase for a few<br>
years. It also means that when fixes go into older branches they can't<br>
be merged forward and need to be redone by hand.<br>
<br>
All of these have very significant drawbacks, and none of them really<br>
sounds like a good option at all.<br>
<br>
An alternative approach<br>
-----------------------<br>
<br>
All of these problems could be mitigated if we could preserve the<br>
history of the existing git mirrors when generating the monorepo. There<br>
are two ways to do this.<br>
<br>
1. Start the monorepo by subtree-merging the various repos together at<br>
an arbitrary point in time.<br>
<br>
2. "Zip" together the commits in each official git mirror repo by<br>
merging them into a combined view after each commit.<br>
<br>
While I personally don't see a problem with (1), I've heard people claim<br>
that they want to use the monorepo to bisect arbitrarily far back into<br>
history. If this is the case, we'd prefer an approach like (2).<br>
<br>
A zippered repository gives us a lot of the benefits of the prototype,<br>
without a lot of the issues that are caused by rewriting history:<br>
<br>
- The commits from the official git mirrors exist as they are now, and<br>
we don't need to deal with changing hashes.<br>
<br>
- Out-of-tree branches have all of their history whether they opt in to<br>
creating a monorepo style history or not<br>
<br>
- All of the repo's history is visible as a monorepo by looking only at<br>
the merge commits. Bisect scripts can easily filter to these.<br>
<br>
- The monorepo commits and individual repo commits are easily<br>
discernible and have a direct link between them in git's DAG, making<br>
it easy to find one from the other.<br>
<br>
To demonstrate this approach, I've put up a snapshot of what LLVM might<br>
look like if we did this, using some scripts that Duncan wrote a while<br>
back to experiment with the idea:<br>
<br>
<a href="https://github.com/bogner/llvm-zipper-prototype" rel="noreferrer" target="_blank">https://github.com/bogner/llvm-zipper-prototype</a><br>
<br>
Note that this is just a demo/prototype. It has some minor issues, isn't<br>
being automatically updated, and I may regenerate it at some point.<br>
<br>
Thoughts?<br>
<br>
Thanks,<br>
-- Justin Bogner<br>
<br>
[LLVM git monorepo prototype]: <a href="https://github.com/llvm-git-prototype/llvm" rel="noreferrer" target="_blank">https://github.com/llvm-git-prototype/llvm</a><br>
[official git mirrors]: <a href="https://git.llvm.org/git/llvm.git" rel="noreferrer" target="_blank">https://git.llvm.org/git/llvm.git</a><br>
[documenting how to port in-progress patches]: <a href="https://reviews.llvm.org/D53414" rel="noreferrer" target="_blank">https://reviews.llvm.org/D53414</a><br>
_______________________________________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev</a><br>
</blockquote></div>