[llvm-dev] [RFC] One or many git repositories?

via llvm-dev llvm-dev at lists.llvm.org
Thu Sep 8 11:58:50 PDT 2016


Mehdi Amini <mehdi.amini at apple.com> writes:

>     After going back and reading the proposal again, I think I
>     understand the plan. I haven't used the SVN repository for years
>     so I was thinking in terms of git, that you'd take the existing
>     git mirrors and combine them (visa submodule or some other
>     mechanism). I understand now the proposal is to take the SVN root
>     and export all of that as one giant git repository. Is that
>     correct?
>
> Yes

Hooray!  I got it!

>     If a commit goes to the monorepository, what is going to extract
>     the relevant bits and commit them to the individual mirrors? The
>     document notes that with a monorepository a single commit can
>     touch multiple projects (that's good!) but something has to
>     extract the parts of that commit that are relevant to each
>     subproject and then send those parts to the subproject repository.
>
> Right, but note that it is already the case today, some people are
> already using SVN to commit to clang and LLVM at the same time

That...is an abomination.  :)

>     There are tools to do this and I think
>     git-subtree is a good candidate [disclosure: I am the git-subtree
>     maintainer] but I'm just curious what's being considered as a
>     solution.
>     
>
> Well we haven't decided on anything for the official mirrors. It looks
> like you're in a good position to help designing how subtree could
> help here :)
> (I have a fairly good understanding of git, but very limited knowledge
> of subtree)

For the subtree split process, git-subtree currently uses an arcane (and
SLOW!) algorithm that I presume was written before filter-branch was
available.  I inherited the code so I don't know the full backstory.  In
any event, it's buggy in some corner cases so my plan is to transition
it to filter-branch so for the most common splits it would simply be a
more user-friendly wrapper around filter-branch.  I'm guessing that's
all the LLVM ecosystem would need.  There are some more intricate cases
but those mostly relate to some enhancements I've made that aren't even
public yet.

> Anyway I hope will be able to put scripts in the repo so that anyone
> downstream can split the repo independently of official mirrors.

That would be excellent.

>     The problem here is that for the build, clang wants to be in
>     llvm/tools and other components want to be in other places.
>
> Not exactly: cmake has magic discovery when clang is in tools, but it
> is not a requirement. You can do (for years): cmake -
> DLLVM_EXTERNAL_CLANG_SOURCE_DIR=path

Oh!  I didn't know that.  That makes certain things I do easier.  :)

Probably the clang build documents need to be updated.  :)

>     Should the monorepository just be structured to have everything in
>     its correct place for building? My inclination is to say "no"
>     because it reduces the visibility of the subprojects, but what are
>     the alternatives?  There are two that come to mind off the top of
>     my head, 1) include symlinks in the repository or 2) change the
>     build so all components can live at the top level.
>
> I'd expect a cmake shortcut cmake -
> DLLVM_ENABLE_PROjECTS=clang,libcxx,compiler-rt

Makes total sense.

>     The individual subproject repositories will have to be created
>     from scratch after the monrepository is created, right? We can't
>     just transition the existing git mirrors to the new setup,
>     correct?
>
> It depends: there are tradeof for each option and I think we need to
> gather community inputs to settle on one. 

Yes.  Lots of discussion is needed here.

>     A subproject repository reboot would involve some not
>     insignificant pain for downstream users because their git
>     histories are suddenly invalid.  They would have to fetch a
>     completely different repository and integrate it into whatever
>     they have.
>
> If we "reboot" the official git mirrors, I expect
> We'd provide scripts for integrating from the new monorepo on top of
> the existing history.

Interesting.  If the existing history can be maintained and built upon
that would relieve a lot of burden on users.

> Ultimately these mirrors are "facilities" but it shouldn't be
> significantly harder for downstream to integrate directly from the
> monorepo with a bit of scripting, and I suspect this scripting is
> likely to be shareable and committed upstream.

I suspect you are right.

>     Bisecting
>     
>     For the multirepository proposal, the document talks about having
>     the git-bisect run script update each submodule during
>     bisection. I suppose that will work but the bisection would only
>     report that the failure exists at a particular commit in the
>     umbrella repository, implying a bunch of different commits, one
>     for each subproject. It wouldn't really point to a particular
>     subproject as being the culprit, correct?
>
> Yes, it depends on the frequency of the update of the umbrella.

I see what you mean.  Yes, you are correct.

>     Thanks for you work on this. This kind of work is crucially
>     important but often unrecognized and underappreciated.
>
> Thanks :)
>
> If you have any input on parts of the document that can be made more
> clear, feel free to chime in in the review.

Will do!

                               -David


More information about the llvm-dev mailing list