[llvm-dev] [RFC] One or many git repositories?

Tue Aug 9 14:12:21 PDT 2016

 > Sorry, I was specially replying to 'I think we'd be thrilled with a
"meh" from your corner.’.  I didn’t feel like that was helping the
conversation along.

Sorry if I offended anyone with this or sent the wrong message.  I was
trying to say, beanz was originally a strong, categorical opponent to
the monorepo.  After some discussion, he became not strongly opposed
to a monorepo, so long as it didn't contain the runtime libraries.
Now Mehdi had a proposal that I was hoping would take him to
"not-strongly-opposed" to a monorepo that did contain the runtime
libraries.  Given where we came from, I would be very happy with that
outcome.

On Tue, Aug 9, 2016 at 1:58 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
> On Aug 9, 2016, at 1:57 PM, Pete Cooper <peter_cooper at apple.com> wrote:
>
>
> On Aug 9, 2016, at 1:55 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
>
> On Aug 9, 2016, at 1:38 PM, Pete Cooper via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
>
> On Aug 9, 2016, at 11:27 AM, Justin Lebar via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> (2) If I’m stuck using git-svn I kinda feel like there is no real point in
> changing anything.
>
>
> No real point *for you specifically*.
>
> But the vast majority of people would not be stuck using git-svn.  And
> in addition the LLVM project would not be stuck using svn, with all
> the baggage, hosting issues, workflow issues (for people other than
> you), etc.
>
> The bar by which this proposal should be measured is not "is it a net
> gain for beanz?"  :)  I think we'd be thrilled with a "meh" from your
> corner.
>
> Justin, I don’t think this conversation is really going anywhere.
>
>
> I’m not sure what you’re referring to exactly, but in the context of "this
> thread isn’t getting anywhere”, I strongly disagree.
>
> Sorry, I was specially replying to 'I think we'd be thrilled with a "meh"
> from your corner.’.  I didn’t feel like that was helping the conversation
> along.
>
>
> OK, I agree with you then :)
>
>
> I agree with everything else you say about actually talking about the
> different proposals.  I hope my point is well received that we really do
> need to eventually describe the impact to daily workflow, once the proposals
> are far enough along to do so.
>
>
> I agree with you also on this. I voiced in the past (on IRC toward
> Justin/David probably) that the proposal should include examples of workflow
> and how they translate to whatever the proposal will be.
>
> Cheers,
>
> —
> Mehdi
>
>
>
> Pete
>
>
> I believe that the recent workflow tests I performed (see my last emails in
> this thread) are proof that this thread has been productive, and I believe
> discussing here and hearing concerns from people (Chris and others) are
> necessary before getting a proposal fleshed out and having a survey.
>
> Having a survey without getting to the end of *what* we want to survey about
> is non-sense to me.
>
> (That may miss your point, but your point wasn’t clear either…).
>
> —
> Mehdi
>
>
>
>
> Renato already mentioned talking about this at the conference, and there has
> also been talk of a survey.  I think we need those to see how the community
> actually feel about the proposals here.
>
> Chris may be the only vocal advocate of an alternative to your proposal, but
> then there are people like me who are quiet because we are waiting for the
> survey to appear.
>
>  I would have been much more vocal if I thought we were actually going to
> adopt the monorepo, but for now I believe it is still only a proposal.
>
> Full disclosure, I don’t want a monorepo.  I think it optimizes for the use
> case where people want to bisect, and I don’t think its reasonable to push
> on everyone to have a monorepo for those who want to bisect.  The submodules
> repo has already been demonstrated as one potential solution to this which
> would allow those who want to bisect to do so, while everyone else can
> continue to work more or less as they do today.
>
> In terms of the proposals, I think you, Mehdi, Chris, and a number of others
> have proven that there is almost no technical solution beyond our reach.
> What we do have are proposals which optimize for different use cases.  Given
> this, I think the most useful thing from my point of view (and hopefully to
> others) would be for those advocating each different solution to actual give
> short examples of each of the different use cases and how to support them.
>
> For example:
>
> Monorepo, pushing a change to compiler-rt:
> 1: Git commit …
> 2: Git pull --rebase
> 3: test
> 4 a: Git push /* no commits to any other project so the push works */.  Goto
> 5
> 4 b: Git push /* someone committed to some other project in the monorepo.
> Goto 2 */
> 5: Done
>
> I know that this example appears negative in the case where someone else
> committed to another project and a rebase is required, but thats exactly the
> point.  This is showing that this particular scenario is potentially a
> problem compared to today and/or other proposals.  A similar workflow could
> (should) be written for the sparse checkout monorepo, GitHub monorepo with
> svn, and submodules cases.  The submodules case will likely show that
> bisecting is more complex than on the monorepo, while pushing is simpler.
>
> Similarly, the submodules workflow probably isn’t capable of a single commit
> to llvm and clang in the revlock case while the monorepo is, but we as a
> community need to decide whether we want to optimize for that or not.  I
> don’t have any data to suggest that revlock commits are frequent/infrequent
> or even a problem in general, and I don’t think we should optimize for that
> case unless its worth doing so.
>
> Only by actually showing the use cases we care about can the community make
> an educated decision about what these proposals actually mean to our daily
> workflow.  We can then choose what we are optimizing for.  I personally want
> to have a very simple list of repo’s to clone from (or just one!) and for
> pushing to be easy, because those are the actions I perform the most often.
> Others will have different use cases they care about and they can choose the
> proposal which suits them best.
>
> Cheers,
> Pete
>
>
> On Tue, Aug 9, 2016 at 11:22 AM, Chris Bieneman <beanz at apple.com> wrote:
>
>
> On Aug 9, 2016, at 10:08 AM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
>
> On Aug 8, 2016, at 6:02 PM, Chris Bieneman <beanz at apple.com> wrote:
>
>
>
> On Aug 8, 2016, at 5:09 PM, Mehdi Amini <mehdi.amini at apple.com> wrote:
>
>
> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
>
> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at google.com> wrote:
>
> Thanks for your thoughts, Chris.
>
> As supporting evidence of this, I was discussing this thread yesterday
> around the office yesterday and had quite a few people responding something
> along the lines of “they’re proposing what?”.
>
>
> I hope they'll join us in this thread.
>
> Ultimately a survey is going to be strongly biased in favor of "don't
> change anything".  There is a strong psychological bias to weight
> losses more than gains, so if one doesn't engage with the issue, it's
> only natural to conclude "keep it as similar as possible to what it is
> today -- that is safe."  But that line of thinking does not
> necessarily lead us to the best outcome.
>
>
> I don’t agree with this assertion. I believe that if you put forth multiple
> proposals, and have an articulate discussion of the merits and costs of each
> solution you can create a survey that can help inform decision making. I
> suppose we can agree to disagree.
>
>
> We've heard in thread from a lot of developers about how a monorepo
> would improve their workflow.  I would love to hear from some
> developers who are actually affected in the way you describe, rather
> than just considering the hypothetical.
>
> My expectation is that the effect of the monorepo on said developers
> would be relatively small -- we're talking about 1gb of disk space.  I
> understand that there's a "yuck" factor to this, but inasmuch as there
> aren't other concrete effects, this is just change aversion.  And
> essentially all of the other effects of the monorepo can be hidden via
> sparse checkouts, as we've discussed.
>
> Maybe I am wrong.  But I don't think we're going to get to the bottom
> of it without actually engaging with people who are actually affected
> in the way you posit.
>
>
> Ok, let me describe a few workflows I’ve used in the last year that are (in
> my mind) adversely impacted by a mono-repo.
>
> Case Study 1 - Simple development on a sub-project
>
> I build LLVM + Clang + Compiler-RT using the just-built Clang to build
> Compiler-RT. I iterate on some complicated Compiler-RT changes over a period
> of a day. Once my Compiler-RT changes are done I rebase the compiler-rt
> repo, rebuild compiler-rt then commit.
>
> With a mono-repo rebasing the checkout means rebasing the whole tree. So,
> either I have to wrangle some crazy git or CMake foo, or when I run “ninja
> compiler-rt” after the rebase it will rebuild LLVM and Clang too. That kinda
> sucks.
>
> What this example illustrates to me is that today we have loosely coupled
> projects with an occasional rev lock. Moving to a mono-repo enforces a tight
> coupling that isn’t strictly required today.
>
> Case Study 2 - Working on a sub-project in isolation across many platforms
>
> I did a lot of work on Compiler-RT last year that had no direct dependency
> on any other LLVM project. During the development I was working with a
> Compiler-RT checkout and a build directory of just Compiler-RT. Every once
> in a while (or every other day as it were) I would make a change that would
> break a configuration that I wasn’t directly developing on. My workflow for
> handling those cases was:
>
> (1) Spin up a VM on a VPS that closely matched the configuration I broke
> (2) Checkout Compiler-RT
> (3) Reproduce, debug, fix the failure
> (4) Commit the patch from the VM
>
> In a mono-repository doing this would require checking out *all*
> sub-projects, not just Compiler-RT. I imagine this probably isn’t a common
> workflow, but it is one I use that would be adversely impacted by needing to
> checkout a full LLVM. Now, you might say I could check out the sub-project
> mirror, but then I can’t commit from the VM, which kinda sucks.
>
>
> So for the “I spin a VM and want to make a commit but don’t want to download
> a few hundred MBs with a git clone” story, it turns out that the github
> bridge with SVN helps to optimize with a “lean” checkout:
>
> I fork the unified repo here:
> https://github.com/joker-eph/llvm-project/commits/master and then:  svn co
> https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>
> So that’s a net “no regression” compared to the current state :)
>
>
> Is the github SVN interface's "co" magically as fast as a git clone?
>
>
> $ time svn co  https://github.com/joker-eph/llvm-project/trunk/compiler-rt
> ….
> real 0m8.539s user 0m0.919s  sys 0m1.917s
> $ time git clone https://github.com/joker-eph/compiler-rt.git
> real 0m5.487s user 0m1.208s sys 0m0.825s
>
>
> That’s actually not terrible! Color me impressed.
>
>
>
> If not, it is a performance regression because today I use git clone and
> git-svn on my VMs just like on my physical machines, and either way it adds
> some crazy complexity.
>
>
> No problem, I get it, exactly same workflow as today:
>
>
> Yep. Which isn’t bad. I do however have two concerns.
>
> (1) What happens if we move to pull request-based workflows? Do we still
> support this workflow?
> (2) If I’m stuck using git-svn I kinda feel like there is no real point in
> changing anything. I dislike this workflow less than the earlier proposals,
> but I see no reason to move to this instead of staying on SVN (other than
> the hosting issues which could be solved in other ways).
>
> -Chris
>
>
> # Clone from the single read-only git repo
> $ git clone https://github.com/joker-eph/compiler-rt.git
> …
> # Configure the SVN remote and initialize the svn metadata
> $ cd compiler-rt
> $ git svn init https://github.com/joker-eph/llvm-project/trunk/compiler-rt
> —username=
> $ git config svn-remote.svn.fetch :refs/remotes/origin/master
> $ git svn rebase -l
> ...
> # Remove and empty file and commit with git
> $ git rm empty
> $ git commit -m "remove empty file"
> # commit/push with svn to the unified git repo
> $ git svn dcommit
> Committing to https://github.com/joker-eph/llvm-project/trunk/compiler-rt
> ...
> D empty
> Committed r354148
>
>
> Here is the commit:
> https://github.com/joker-eph/llvm-project/commit/5f7e977c8cf3c33153d91be9b556143b49911ebe
>
>
> —
> Mehdi
>
>
>
>
>
>
>
>
>
> While admittedly you do get a linear history with using the mono-repository,
> that isn’t the only way to solve the problem, and I don’t really think that
> the benefit (not needing to write some tooling) justifies the increased
> burden applied to contributors that don’t use the full LLVM family of
> projects.
>
>
> I think the trade-off you're considering here (cost to developers who
> use llvm plus a version-locked subrepo vs. cost to developers who
> don't want an llvm clone) is the right one.
>
>
> I actually think there are *a lot* more considerations we need to be making
> for an infrastructure change like this. While it is true that our SCM
> hosting strategy primarily impacts developers, it also impacts our users. We
> should be conscious of the impact to downstream users in making
> infrastructure changes like this. That is part of why the idea of a survey
> holds appeal to me; it would give us the opportunity to get feedback from a
> much wider audience than the current “people on llvm-dev who haven’t been
> scared away”.
>
> But as someone who has
> extensively used git submodules and repo (a wrapper script), I
> strongly disagree with the judgement that a monorepo would not be a
> significant improvement.
>
> Our primary disagreement, I think, is over how much cost there is to
> "writing some tooling".  To me, this is a significant barrier standing
> in the way of developer productivity.  Here at Google I did a quick
> survey, and more than half of us don't have scripts of the sort that
> Justin Bogner described.  We are all just floundering around rebasing
> clang and llvm until it compiles.  It *sucks*.
>
>
> I actually think we’re both talking about solutions that require tooling,
> and while we *could* be disagreeing over how much effort each tooling
> initiative would require (I think they’re pretty close, so I don’t care to
> have that argument), my actual disagreement with your proposal is that it is
> a change that impacts developers and users universally and I don’t think
> that it is justified. Simply put, I don’t feel that the benefits are
> substantial enough to warrant the kind of disruptive change you’re
> proposing.
>
>
> I suggest that saying that all of these developers are "doing it
> wrong" is not helpful.
>
>
> Maybe I’m missing something, but I don’t think I said anyone was “doing it
> wrong”. Bisecting across multiple git repositories isn’t a great experience.
> But neither is bisecting across a half dozen separate folders in an SVN
> repository. Both the submodule solution and the mono-repo solution solve
> this problem equivalently well.
>
> Not everyone has the git and python/bash chops
> to write the necessary scripts.  Not everyone has the personality to
> obsessively script around stuff, or the desire to maintain said
> scripts.  Not everyone works on llvm/clang so much that it's worth
> adopting a special-snowflake workflow.  And some of us -- myself
> included -- have extensive git scripts which work with the standard
> git workflow but would be completely broken by adding a custom level
> of indirection around git.
>
> When put this way, maybe it's clear that it's actually a niche set of
> people for whom "script around the brokenness" is a good solution.
>
>
> I’m not sure what “brokenness” you’re referring to. We have a collection of
> loosely connected projects by design. As a result of that intentional design
> certain workflows will be impacted. I don’t think that is brokenness. I
> think our loose coupling is a feature even if it makes some workflows
> harder.
>
> -Chris
>
>
> As I've said a bunch of times above, we have to weigh a cost paid by
> all of us every time we type a command that starts with "git" --
> something we do tens or hundreds of times a day -- versus the one-time
> cost of asking people to download 1gb of data.
>
> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> I’m just now catching up on this massive thread after being on vacation last
> week, and I have a few thoughts I’d like to share.
>
> First and foremost please don’t consider lack of dissent on the thread as
> presence of consensus. The various git-related threads on LLVM-dev lately
> have been so active and contentious that I think a lot of people are zoning
> out on the conversations. As supporting evidence of this, I was discussing
> this thread yesterday around the office yesterday and had quite a few people
> responding something along the lines of “they’re proposing what?”.
>
> I think it would be great for us to have several different proposals for how
> the git-transition could work, and have a survey to get people’s opinions. I
> know this has been discussed repeatedly, and I want to put in my vote in
> favor of having a survey that takes into account multiple different
> approaches.
>
> WRT the actual proposal in this thread, I’m strongly opposed to a
> mono-repository. While I understand the argument that the full clone’s cost
> on disk space is minimal compared to an LLVM object directory, what about
> for contributors that contribute to the smaller runtimes projects but *not*
> to LLVM or Clang. A contributor that only contributes to libcxx or
> compiler-rt being forced to do a full clone of all the LLVM projects in
> order to push a patch kinda sucks.
>
> I want to point out a few workflows people may not be considering.
>
> Clang can be built against an installed LLVM. I know this workflow is used
> by some people because I’ve broken it in the past and had to fix it. With a
> mono-repo this workflow gets a bit more complicated because you’d need to do
> sparse checkouts, and it probably means we should just nuke the workflow
> entirely because there is no real value added by having it.
>
> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for the
> common use case maintaining sparse repository mirrors would limit impact of
> this on users, should any GCC user want to contribute to Compiler-RT, you’re
> forcing them to clone a much larger repository than necessary.
>
> The same problem with Compiler-RT’s sanitizers also applies to libcxx,
> libcxxabi, libunwind, and potentially any other runtime library projects
> that we may create in the future.
>
> Beyond all that I want to point out that the git multi-repository story is
> basically the same thing we have today with SVN except for the absence of a
> monotonically increasing number that corresponds across repositories. While
> admittedly you do get a linear history with using the mono-repository, that
> isn’t the only way to solve the problem, and I don’t really think that the
> benefit (not needing to write some tooling) justifies the increased burden
> applied to contributors that don’t use the full LLVM family of projects.
>
> I think we have some pretty strong evidence in the form of the github fork
> counts (https://github.com/llvm-mirror/) that most people aren’t using all
> of the LLVM projects. In fact, by that evidence Clang (the second most
> popular project) is forked less than 2/3 as many times as LLVM.
>
> -Chris
>
>
> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> Even if it were possible, I would still keep my upstream checkout
> separate just as a safety measure, to keep from sending private stuff
> upstream by accident.
>
>
> Just FYI, this is our (Azul's) workflow as well, and for similar
> reasons.
>
>
> Same here.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
>