[llvm-dev] [RFC] One or many git repositories?

Wed Jul 27 11:30:00 PDT 2016

Justin Lebar via llvm-dev <llvm-dev at lists.llvm.org> writes:
> Thanks for your thoughts, Chris.
>
>> As supporting evidence of this, I was discussing this thread
>> yesterday around the office yesterday and had quite a few people
>> responding something along the lines of “they’re proposing what?”.
>
> I hope they'll join us in this thread.
>
> Ultimately a survey is going to be strongly biased in favor of "don't
> change anything".  There is a strong psychological bias to weight
> losses more than gains, so if one doesn't engage with the issue, it's
> only natural to conclude "keep it as similar as possible to what it is
> today -- that is safe."  But that line of thinking does not
> necessarily lead us to the best outcome.
>
> We've heard in thread from a lot of developers about how a monorepo
> would improve their workflow.  I would love to hear from some
> developers who are actually affected in the way you describe, rather
> than just considering the hypothetical.
>
> My expectation is that the effect of the monorepo on said developers
> would be relatively small -- we're talking about 1gb of disk space.  I
> understand that there's a "yuck" factor to this, but inasmuch as there
> aren't other concrete effects, this is just change aversion.  And
> essentially all of the other effects of the monorepo can be hidden via
> sparse checkouts, as we've discussed.
>
> Maybe I am wrong.  But I don't think we're going to get to the bottom
> of it without actually engaging with people who are actually affected
> in the way you posit.

Well, I'm one of those people. I'm still convinced that, for me,
switching to a monorepo is a few weeks or maybe a couple of months of
disruption to my life[1] and we end up in a state that isn't any better,
just arbitrarily different.

[1]: re-cloning tens of repos across a few machines and migrating
     branches from them, adjusting my workflow to deal with the new
     layout, blowing away all of my existing build trees, arguing about
     how to handle legacy branches that now need to merge between a
     multi-repo layout and a monorepo layout, asking people to update
     bot configs, figuring out how the downstream clones of
     clang-without-llvm that I have to deal with will work, etc).

>> While admittedly you do get a linear history with using the
>> mono-repository, that isn’t the only way to solve the problem, and I
>> don’t really think that the benefit (not needing to write some
>> tooling) justifies the increased burden applied to contributors that
>> don’t use the full LLVM family of projects.
>
> I think the trade-off you're considering here (cost to developers who
> use llvm plus a version-locked subrepo vs. cost to developers who
> don't want an llvm clone) is the right one.  But as someone who has
> extensively used git submodules and repo (a wrapper script), I
> strongly disagree with the judgement that a monorepo would not be a
> significant improvement.
>
> Our primary disagreement, I think, is over how much cost there is to
> "writing some tooling".  To me, this is a significant barrier standing
> in the way of developer productivity.  Here at Google I did a quick
> survey, and more than half of us don't have scripts of the sort that
> Justin Bogner described.  We are all just floundering around rebasing
> clang and llvm until it compiles.  It *sucks*.

Note that the only tooling I have is a single script that just loops
over all of the git directories and runs a single git commit. Here's
what I actually use, but it started as literally a single loop over
the results of $(find . -name .git), which is good enough.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: git-llvm
Type: text/x-shell
Size: 494 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160727/fc9dc51c/attachment.bin>
-------------- next part --------------

> I suggest that saying that all of these developers are "doing it
> wrong" is not helpful.  Not everyone has the git and python/bash chops
> to write the necessary scripts.  Not everyone has the personality to
> obsessively script around stuff, or the desire to maintain said
> scripts.  Not everyone works on llvm/clang so much that it's worth
> adopting a special-snowflake workflow.  And some of us -- myself
> included -- have extensive git scripts which work with the standard
> git workflow but would be completely broken by adding a custom level
> of indirection around git.
>
> When put this way, maybe it's clear that it's actually a niche set of
> people for whom "script around the brokenness" is a good solution.
>
> As I've said a bunch of times above, we have to weigh a cost paid by
> all of us every time we type a command that starts with "git" --
> something we do tens or hundreds of times a day -- versus the one-time
> cost of asking people to download 1gb of data.

It's important to take into account that the cost of migrating to a
radically different layout has really been glossed over in this
thread. It's certainly a one-time cost, but it's *a lot* more than
"downloading 1gb of data". Every downstream project will need to change
workflows, and every downstream developer will need to adjust how they
do things. I expect everyone to lose at least a day of work from this.

Maybe that's worth it for you to up your productivity on daily tasks, I
don't know, but please take this into consideration.

Anyways, I need to drop out of this thread again. I've decided I can
live with whatever we do, and I don't want to spend any more time on
this.