[llvm-dev] [RFC] One or many git repositories?

Mehdi Amini via llvm-dev llvm-dev at lists.llvm.org
Thu Sep 8 11:37:49 PDT 2016

Sent from my iPhone

> On Sep 8, 2016, at 11:08 AM, dag at cray.com wrote:
> Mehdi Amini via llvm-dev <llvm-dev at lists.llvm.org> writes:
>> First, have you read this document: https://reviews.llvm.org/D24167 ?
>> TLDR: The answer is no: you have to see it as it is today, i.e. a
>> single SVN repo containing all the sub-projects, and “exports” in
>> individual repositories.
>> The same thing after: a single git repo containing all the subprojects
>> side-by-side and the *same* “exports” in individual repositories.
> Sorry, I sent my earlier reply today before I intended to.
> After going back and reading the proposal again, I think I understand
> the plan.  I haven't used the SVN repository for years so I was thinking
> in terms of git, that you'd take the existing git mirrors and combine
> them (visa submodule or some other mechanism).  I understand now the
> proposal is to take the SVN root and export all of that as one giant git
> repository.  Is that correct?


> If so, that raises a number of questions for me that aren't directly
> addressed in the document as far as I can see:
> 1. How are the individual component git mirrors going to be maintained?

Just exactly as they are today.

> If a commit goes to the monorepository, what is going to extract the
> relevant bits and commit them to the individual mirrors?  The document
> notes that with a monorepository a single commit can touch multiple
> projects (that's good!) but something has to extract the parts of that
> commit that are relevant to each subproject and then send those parts to
> the subproject repository.

Right, but note that it is already the case today, some people are already using SVN to commit to clang and LLVM at the same time, and the same commit in SVN will result in one commit in the llvm git repo and another commit in the clang repo.

>  There are tools to do this and I think
> git-subtree is a good candidate [disclosure: I am the git-subtree
> maintainer] but I'm just curious what's being considered as a solution.

Well we haven't decided on anything for the official mirrors. It looks like you're in a good position to help designing how subtree could help here :)
(I have a fairly good understanding of git, but very limited knowledge of subtree)
Anyway I hope will be able to put scripts in the repo so that anyone downstream can split the repo independently of official mirrors.

> 2. Is there any consideration for restructuring the directory layout?
> The document has this to say about checking out multiple components:
>> **Monorepo Proposal**
>> The repository contains natively the source for every sub-projects at the right
>> revision, which makes this straightforward::
>>  git clone https://github.com/llvm/llvm-projects.git llvm
>>  cd llvm
>>  git checkout $REVISION
>> As before, at this point clang, llvm, and libcxx are stored in directories
>> alongside each other.
> The problem here is that for the build, clang wants to be in llvm/tools
> and other components want to be in other places.  

Not exactly: cmake has magic discovery when clang is in tools, but it is not a requirement. You can do (for years): cmake -DLLVM_EXTERNAL_CLANG_SOURCE_DIR=path

> Should the
> monorepository just be structured to have everything in its correct
> place for building?  My inclination is to say "no" because it reduces
> the visibility of the subprojects, but what are the alternatives?  There
> are two that come to mind off the top of my head, 1) include symlinks in
> the repository or 2) change the build so all components can live at the
> top level.

I'd expect a cmake shortcut cmake -DLLVM_ENABLE_PROjECTS=clang,libcxx,compiler-rt
> I think it's important to think about these kinds of questions because
> once a repository layout has been settled on, it's hard to change.  Yes,
> it is relatively easy to move entire directories to new places in git,
> but that not only would require changes to whatever entity updates the
> subproject repositories, it's potentially a huge social issue, which are
> typically the most difficult problems to address.  :)
> 3. How are the subproject repositories going to be created/migrated?
> The individual subproject repositories will have to be created from
> scratch after the monrepository is created, right?  We can't just
> transition the existing git mirrors to the new setup, correct?  

It depends: there are tradeof for each option and I think we need to gather community inputs to settle on one. 

> A
> subproject repository reboot would involve some not insignificant pain
> for downstream users because their git histories are suddenly invalid.
> They would have to fetch a completely different repository and integrate
> it into whatever they have.

If we "reboot" the official git mirrors, I expect
We'd provide scripts for integrating from the new monorepo on top of the existing history.

Ultimately these mirrors are "facilities" but it shouldn't be significantly harder  for downstream to integrate directly from the monorepo with a bit of scripting, and I suspect this scripting is likely to be shareable and committed upstream.

> If there is some way to maintain the existing git mirrors and layer new
> monorepository commits on top of the existing history that would be
> fantastic.  I believe it is technically possible (I might need to add
> some enhancements to git-subtree :)) but I don't know if anyone has
> explored this.  I would love to be told you all have the answers
> already.  :)
> Bisecting
> For the multirepository proposal, the document talks about having the
> git-bisect run script update each submodule during bisection.  I suppose
> that will work but the bisection would only report that the failure
> exists at a particular commit in the umbrella repository, implying a
> bunch of different commits, one for each subproject.  It wouldn't really
> point to a particular subproject as being the culprit, correct?

Yes, it depends on the frequency of the update of the umbrella.

>  The
> document even hints at this: "it is possible that one commit in the
> umbrella repository includes multiple commits in the sub-projects"
> That's what I was getting at with my submodule bisect question.  It can
> only bisect to a granularity of "one of these subprojects at their
> respective commits caused the problem."  With a true monorepository
> bisect can drill down to the exact commit within a subproject or across
> multiple subprojects if the commit touched multiple subprojects.  To me
> this is a giant advantage of a non-submodule-based monorepository, which
> I think is what the monorepository proposal is.
> If everything I've written here is generally correct, I think the
> monorepository will work for us, as long as each subproject repository
> is maintained at a granularity of one subproject commit per commit to
> the corresponding directory in the monorepository (i.e. full history is
> maintained).
> Thanks for you work on this.  This kind of work is crucially important
> but often unrecognized and underappreciated.

Thanks :)

If you have any input on parts of the document that can be made more clear, feel free to chime in in the review.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160908/83f13591/attachment.html>

More information about the llvm-dev mailing list