[llvm-dev] RFC: Dealing with out of tree changes and the LLVM git monorepo

Justin Bogner via llvm-dev llvm-dev at lists.llvm.org
Wed Oct 31 10:31:02 PDT 2018


Justin Lebar <jlebar at google.com> writes:
> I'm going to try to stay out of the question of whether or not we should do
> it this way.  (We'll see if I succeed.  :)
>
> But if we do decide to do it this way, it would be nice if we'd do an N-way
> merge when there's a single SVN commit that affects multiple git repos.

The prototype I linked to does this. See for example:

  https://github.com/bogner/llvm-zipper-prototype/commit/6258012a126e2c9eecc6ae70eabec71bd8f6a8f5

> On Wed, Oct 31, 2018 at 9:22 AM Justin Bogner via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>> Hi all,
>>
>> I've spent some time in the last couple of days trying to figure out how
>> to adopt the [LLVM git monorepo prototype] for an out of tree backend.
>> TLDR: I'm not convinced that this prototype is the right approach to
>> converting to the monorepo, and I have a possible alternative.
>>
>> The main problems I'm running into stem from the fact that this
>> prototype rewrites all of history from scratch rather than leverage the
>> existing [official git mirrors]. This makes migrating out-of-tree work
>> from the official git mirrors to this repo very difficult, since there
>> is no shared history. Some efforts have gone into [documenting how to
>> port in-progress patches], but this doesn't attempt to discuss how to
>> handle more substantial out of tree work.
>>
>> Issues with integrating the prototype
>> -------------------------------------
>>
>> As far as I can tell, my options for trying to integrate with this
>> monorepo are fairly limited.
>>
>> If I merge my trees directly into the monorepo prototype at head, I end
>> up with two copies of every commit, one of which is a monorepo style
>> commit and one with the singular repo history. These commits are
>> completely unrelated to each other, and exist in two separate parallel
>> histories, making it difficult to correlate one to the other or even to
>> tell which is which.
>>
>> An arguably cleaner solution would be try to recreate all of my trees'
>> history artificially as if they were based on the monorepo prototype
>> history all along, but this has two problems. First, it's a very
>> significant tooling effort to do this - I'd need to match up several
>> years of merge points to their corresponding spots in the monorepo
>> prototype and somehow redo all of the merges in the same ways. Tools
>> like "rebase --preserve-merges" don't really help here, since they abort
>> on merge conflicts and ask a human to resolve them again. Even if I were
>> to come up with tooling that managed this, I'm still left with a
>> completely new set of hashes for commits and no easy way to map them to
>> existing references in emails, bug trackers, and release notes.
>>
>> Finally, there's the option of throwing away all of my history and
>> applying my out of tree work in a single patch. This makes git-log and
>> git-blame useless for investigating issues in my codebase for a few
>> years. It also means that when fixes go into older branches they can't
>> be merged forward and need to be redone by hand.
>>
>> All of these have very significant drawbacks, and none of them really
>> sounds like a good option at all.
>>
>> An alternative approach
>> -----------------------
>>
>> All of these problems could be mitigated if we could preserve the
>> history of the existing git mirrors when generating the monorepo. There
>> are two ways to do this.
>>
>> 1. Start the monorepo by subtree-merging the various repos together at
>>    an arbitrary point in time.
>>
>> 2. "Zip" together the commits in each official git mirror repo by
>>    merging them into a combined view after each commit.
>>
>> While I personally don't see a problem with (1), I've heard people claim
>> that they want to use the monorepo to bisect arbitrarily far back into
>> history. If this is the case, we'd prefer an approach like (2).
>>
>> A zippered repository gives us a lot of the benefits of the prototype,
>> without a lot of the issues that are caused by rewriting history:
>>
>> - The commits from the official git mirrors exist as they are now, and
>>   we don't need to deal with changing hashes.
>>
>> - Out-of-tree branches have all of their history whether they opt in to
>>   creating a monorepo style history or not
>>
>> - All of the repo's history is visible as a monorepo by looking only at
>>   the merge commits. Bisect scripts can easily filter to these.
>>
>> - The monorepo commits and individual repo commits are easily
>>   discernible and have a direct link between them in git's DAG, making
>>   it easy to find one from the other.
>>
>> To demonstrate this approach, I've put up a snapshot of what LLVM might
>> look like if we did this, using some scripts that Duncan wrote a while
>> back to experiment with the idea:
>>
>>   https://github.com/bogner/llvm-zipper-prototype
>>
>> Note that this is just a demo/prototype. It has some minor issues, isn't
>> being automatically updated, and I may regenerate it at some point.
>>
>> Thoughts?
>>
>> Thanks,
>> -- Justin Bogner
>>
>> [LLVM git monorepo prototype]: https://github.com/llvm-git-prototype/llvm
>> [official git mirrors]: https://git.llvm.org/git/llvm.git
>> [documenting how to port in-progress patches]:
>> https://reviews.llvm.org/D53414
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>


More information about the llvm-dev mailing list