[llvm-dev] [RFC] One or many git repositories?

Wed Jul 27 10:21:09 PDT 2016

Thanks for your thoughts, Chris.

> As supporting evidence of this, I was discussing this thread yesterday around the office yesterday and had quite a few people responding something along the lines of “they’re proposing what?”.

I hope they'll join us in this thread.

Ultimately a survey is going to be strongly biased in favor of "don't
change anything".  There is a strong psychological bias to weight
losses more than gains, so if one doesn't engage with the issue, it's
only natural to conclude "keep it as similar as possible to what it is
today -- that is safe."  But that line of thinking does not
necessarily lead us to the best outcome.

We've heard in thread from a lot of developers about how a monorepo
would improve their workflow.  I would love to hear from some
developers who are actually affected in the way you describe, rather
than just considering the hypothetical.

My expectation is that the effect of the monorepo on said developers
would be relatively small -- we're talking about 1gb of disk space.  I
understand that there's a "yuck" factor to this, but inasmuch as there
aren't other concrete effects, this is just change aversion.  And
essentially all of the other effects of the monorepo can be hidden via
sparse checkouts, as we've discussed.

Maybe I am wrong.  But I don't think we're going to get to the bottom
of it without actually engaging with people who are actually affected
in the way you posit.

> While admittedly you do get a linear history with using the mono-repository, that isn’t the only way to solve the problem, and I don’t really think that the benefit (not needing to write some tooling) justifies the increased burden applied to contributors that don’t use the full LLVM family of projects.

I think the trade-off you're considering here (cost to developers who
use llvm plus a version-locked subrepo vs. cost to developers who
don't want an llvm clone) is the right one.  But as someone who has
extensively used git submodules and repo (a wrapper script), I
strongly disagree with the judgement that a monorepo would not be a
significant improvement.

Our primary disagreement, I think, is over how much cost there is to
"writing some tooling".  To me, this is a significant barrier standing
in the way of developer productivity.  Here at Google I did a quick
survey, and more than half of us don't have scripts of the sort that
Justin Bogner described.  We are all just floundering around rebasing
clang and llvm until it compiles.  It *sucks*.

I suggest that saying that all of these developers are "doing it
wrong" is not helpful.  Not everyone has the git and python/bash chops
to write the necessary scripts.  Not everyone has the personality to
obsessively script around stuff, or the desire to maintain said
scripts.  Not everyone works on llvm/clang so much that it's worth
adopting a special-snowflake workflow.  And some of us -- myself
included -- have extensive git scripts which work with the standard
git workflow but would be completely broken by adding a custom level
of indirection around git.

When put this way, maybe it's clear that it's actually a niche set of
people for whom "script around the brokenness" is a good solution.

As I've said a bunch of times above, we have to weigh a cost paid by
all of us every time we type a command that starts with "git" --
something we do tens or hundreds of times a day -- versus the one-time
cost of asking people to download 1gb of data.

On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
<llvm-dev at lists.llvm.org> wrote:
> I’m just now catching up on this massive thread after being on vacation last
> week, and I have a few thoughts I’d like to share.
>
> First and foremost please don’t consider lack of dissent on the thread as
> presence of consensus. The various git-related threads on LLVM-dev lately
> have been so active and contentious that I think a lot of people are zoning
> out on the conversations. As supporting evidence of this, I was discussing
> this thread yesterday around the office yesterday and had quite a few people
> responding something along the lines of “they’re proposing what?”.
>
> I think it would be great for us to have several different proposals for how
> the git-transition could work, and have a survey to get people’s opinions. I
> know this has been discussed repeatedly, and I want to put in my vote in
> favor of having a survey that takes into account multiple different
> approaches.
>
> WRT the actual proposal in this thread, I’m strongly opposed to a
> mono-repository. While I understand the argument that the full clone’s cost
> on disk space is minimal compared to an LLVM object directory, what about
> for contributors that contribute to the smaller runtimes projects but *not*
> to LLVM or Clang. A contributor that only contributes to libcxx or
> compiler-rt being forced to do a full clone of all the LLVM projects in
> order to push a patch kinda sucks.
>
> I want to point out a few workflows people may not be considering.
>
> Clang can be built against an installed LLVM. I know this workflow is used
> by some people because I’ve broken it in the past and had to fix it. With a
> mono-repo this workflow gets a bit more complicated because you’d need to do
> sparse checkouts, and it probably means we should just nuke the workflow
> entirely because there is no real value added by having it.
>
> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for the
> common use case maintaining sparse repository mirrors would limit impact of
> this on users, should any GCC user want to contribute to Compiler-RT, you’re
> forcing them to clone a much larger repository than necessary.
>
> The same problem with Compiler-RT’s sanitizers also applies to libcxx,
> libcxxabi, libunwind, and potentially any other runtime library projects
> that we may create in the future.
>
> Beyond all that I want to point out that the git multi-repository story is
> basically the same thing we have today with SVN except for the absence of a
> monotonically increasing number that corresponds across repositories. While
> admittedly you do get a linear history with using the mono-repository, that
> isn’t the only way to solve the problem, and I don’t really think that the
> benefit (not needing to write some tooling) justifies the increased burden
> applied to contributors that don’t use the full LLVM family of projects.
>
> I think we have some pretty strong evidence in the form of the github fork
> counts (https://github.com/llvm-mirror/) that most people aren’t using all
> of the LLVM projects. In fact, by that evidence Clang (the second most
> popular project) is forked less than 2/3 as many times as LLVM.
>
> -Chris
>
>
> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> Even if it were possible, I would still keep my upstream checkout
> separate just as a safety measure, to keep from sending private stuff
> upstream by accident.
>
>
> Just FYI, this is our (Azul's) workflow as well, and for similar
> reasons.
>
>
> Same here.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>