[llvm-dev] [RFC] One or many git repositories?
llvm-dev at lists.llvm.org
Thu Sep 8 11:58:50 PDT 2016
Mehdi Amini <mehdi.amini at apple.com> writes:
> After going back and reading the proposal again, I think I
> understand the plan. I haven't used the SVN repository for years
> so I was thinking in terms of git, that you'd take the existing
> git mirrors and combine them (visa submodule or some other
> mechanism). I understand now the proposal is to take the SVN root
> and export all of that as one giant git repository. Is that
Hooray! I got it!
> If a commit goes to the monorepository, what is going to extract
> the relevant bits and commit them to the individual mirrors? The
> document notes that with a monorepository a single commit can
> touch multiple projects (that's good!) but something has to
> extract the parts of that commit that are relevant to each
> subproject and then send those parts to the subproject repository.
> Right, but note that it is already the case today, some people are
> already using SVN to commit to clang and LLVM at the same time
That...is an abomination. :)
> There are tools to do this and I think
> git-subtree is a good candidate [disclosure: I am the git-subtree
> maintainer] but I'm just curious what's being considered as a
> Well we haven't decided on anything for the official mirrors. It looks
> like you're in a good position to help designing how subtree could
> help here :)
> (I have a fairly good understanding of git, but very limited knowledge
> of subtree)
For the subtree split process, git-subtree currently uses an arcane (and
SLOW!) algorithm that I presume was written before filter-branch was
available. I inherited the code so I don't know the full backstory. In
any event, it's buggy in some corner cases so my plan is to transition
it to filter-branch so for the most common splits it would simply be a
more user-friendly wrapper around filter-branch. I'm guessing that's
all the LLVM ecosystem would need. There are some more intricate cases
but those mostly relate to some enhancements I've made that aren't even
> Anyway I hope will be able to put scripts in the repo so that anyone
> downstream can split the repo independently of official mirrors.
That would be excellent.
> The problem here is that for the build, clang wants to be in
> llvm/tools and other components want to be in other places.
> Not exactly: cmake has magic discovery when clang is in tools, but it
> is not a requirement. You can do (for years): cmake -
Oh! I didn't know that. That makes certain things I do easier. :)
Probably the clang build documents need to be updated. :)
> Should the monorepository just be structured to have everything in
> its correct place for building? My inclination is to say "no"
> because it reduces the visibility of the subprojects, but what are
> the alternatives? There are two that come to mind off the top of
> my head, 1) include symlinks in the repository or 2) change the
> build so all components can live at the top level.
> I'd expect a cmake shortcut cmake -
Makes total sense.
> The individual subproject repositories will have to be created
> from scratch after the monrepository is created, right? We can't
> just transition the existing git mirrors to the new setup,
> It depends: there are tradeof for each option and I think we need to
> gather community inputs to settle on one.
Yes. Lots of discussion is needed here.
> A subproject repository reboot would involve some not
> insignificant pain for downstream users because their git
> histories are suddenly invalid. They would have to fetch a
> completely different repository and integrate it into whatever
> they have.
> If we "reboot" the official git mirrors, I expect
> We'd provide scripts for integrating from the new monorepo on top of
> the existing history.
Interesting. If the existing history can be maintained and built upon
that would relieve a lot of burden on users.
> Ultimately these mirrors are "facilities" but it shouldn't be
> significantly harder for downstream to integrate directly from the
> monorepo with a bit of scripting, and I suspect this scripting is
> likely to be shareable and committed upstream.
I suspect you are right.
> For the multirepository proposal, the document talks about having
> the git-bisect run script update each submodule during
> bisection. I suppose that will work but the bisection would only
> report that the failure exists at a particular commit in the
> umbrella repository, implying a bunch of different commits, one
> for each subproject. It wouldn't really point to a particular
> subproject as being the culprit, correct?
> Yes, it depends on the frequency of the update of the umbrella.
I see what you mean. Yes, you are correct.
> Thanks for you work on this. This kind of work is crucially
> important but often unrecognized and underappreciated.
> Thanks :)
> If you have any input on parts of the document that can be made more
> clear, feel free to chime in in the review.
More information about the llvm-dev