[llvm-dev] [RFC] One or many git repositories?

Sean Silva via llvm-dev llvm-dev at lists.llvm.org
Mon Aug 8 18:33:11 PDT 2016


On Mon, Aug 8, 2016 at 5:09 PM, Mehdi Amini via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

>
> On Jul 27, 2016, at 12:50 PM, Chris Bieneman via llvm-dev <
> llvm-dev at lists.llvm.org> wrote:
>
>
> On Jul 27, 2016, at 10:21 AM, Justin Lebar <jlebar at google.com> wrote:
>
> Thanks for your thoughts, Chris.
>
> As supporting evidence of this, I was discussing this thread yesterday
> around the office yesterday and had quite a few people responding something
> along the lines of “they’re proposing what?”.
>
>
> I hope they'll join us in this thread.
>
> Ultimately a survey is going to be strongly biased in favor of "don't
> change anything".  There is a strong psychological bias to weight
> losses more than gains, so if one doesn't engage with the issue, it's
> only natural to conclude "keep it as similar as possible to what it is
> today -- that is safe."  But that line of thinking does not
> necessarily lead us to the best outcome.
>
>
> I don’t agree with this assertion. I believe that if you put forth
> multiple proposals, and have an articulate discussion of the merits and
> costs of each solution you can create a survey that can help inform
> decision making. I suppose we can agree to disagree.
>
>
> We've heard in thread from a lot of developers about how a monorepo
> would improve their workflow.  I would love to hear from some
> developers who are actually affected in the way you describe, rather
> than just considering the hypothetical.
>
> My expectation is that the effect of the monorepo on said developers
> would be relatively small -- we're talking about 1gb of disk space.  I
> understand that there's a "yuck" factor to this, but inasmuch as there
> aren't other concrete effects, this is just change aversion.  And
> essentially all of the other effects of the monorepo can be hidden via
> sparse checkouts, as we've discussed.
>
> Maybe I am wrong.  But I don't think we're going to get to the bottom
> of it without actually engaging with people who are actually affected
> in the way you posit.
>
>
> Ok, let me describe a few workflows I’ve used in the last year that are
> (in my mind) adversely impacted by a mono-repo.
>
> Case Study 1 - Simple development on a sub-project
>
> I build LLVM + Clang + Compiler-RT using the just-built Clang to build
> Compiler-RT. I iterate on some complicated Compiler-RT changes over a
> period of a day. Once my Compiler-RT changes are done I rebase the
> compiler-rt repo, rebuild compiler-rt then commit.
>
> With a mono-repo rebasing the checkout means rebasing the whole tree. So,
> either I have to wrangle some crazy git or CMake foo, or when I run “ninja
> compiler-rt” after the rebase it will rebuild LLVM and Clang too. That
> kinda sucks.
>
> What this example illustrates to me is that today we have loosely coupled
> projects with an occasional rev lock. Moving to a mono-repo enforces a
> tight coupling that isn’t strictly required today.
>
> Case Study 2 - Working on a sub-project in isolation across many platforms
>
> I did a lot of work on Compiler-RT last year that had no direct dependency
> on any other LLVM project. During the development I was working with a
> Compiler-RT checkout and a build directory of just Compiler-RT. Every once
> in a while (or every other day as it were) I would make a change that would
> break a configuration that I wasn’t directly developing on. My workflow for
> handling those cases was:
>
> (1) Spin up a VM on a VPS that closely matched the configuration I broke
> (2) Checkout Compiler-RT
> (3) Reproduce, debug, fix the failure
> (4) Commit the patch from the VM
>
> In a mono-repository doing this would require checking out *all*
> sub-projects, not just Compiler-RT. I imagine this probably isn’t a common
> workflow, but it is one I use that would be adversely impacted by needing
> to checkout a full LLVM. Now, you might say I could check out the
> sub-project mirror, but then I can’t commit from the VM, which kinda sucks.
>
>
> So for the “I spin a VM and want to make a commit but don’t want to
> download a few hundred MBs with a git clone” story, it turns out that the
> github bridge with SVN helps to optimize with a “lean” checkout:
>
> I fork the unified repo here: https://github.com/joker-eph/
> llvm-project/commits/master and then:  svn co
> https://github.com/joker-eph/llvm-project/trunk/compiler-rt
>

Wow, I didn't know github could do this. This blows my mind!

-- Sean Silva


>
> So that’s a net “no regression” compared to the current state :)
>
>
>> Mehdi
>
>
>
>
>
>
>
>
> While admittedly you do get a linear history with using the
> mono-repository, that isn’t the only way to solve the problem, and I don’t
> really think that the benefit (not needing to write some tooling) justifies
> the increased burden applied to contributors that don’t use the full LLVM
> family of projects.
>
>
> I think the trade-off you're considering here (cost to developers who
> use llvm plus a version-locked subrepo vs. cost to developers who
> don't want an llvm clone) is the right one.
>
>
> I actually think there are *a lot* more considerations we need to be
> making for an infrastructure change like this. While it is true that our
> SCM hosting strategy primarily impacts developers, it also impacts our
> users. We should be conscious of the impact to downstream users in making
> infrastructure changes like this. That is part of why the idea of a survey
> holds appeal to me; it would give us the opportunity to get feedback from a
> much wider audience than the current “people on llvm-dev who haven’t been
> scared away”.
>
> But as someone who has
> extensively used git submodules and repo (a wrapper script), I
> strongly disagree with the judgement that a monorepo would not be a
> significant improvement.
>
> Our primary disagreement, I think, is over how much cost there is to
> "writing some tooling".  To me, this is a significant barrier standing
> in the way of developer productivity.  Here at Google I did a quick
> survey, and more than half of us don't have scripts of the sort that
> Justin Bogner described.  We are all just floundering around rebasing
> clang and llvm until it compiles.  It *sucks*.
>
>
> I actually think we’re both talking about solutions that require tooling,
> and while we *could* be disagreeing over how much effort each tooling
> initiative would require (I think they’re pretty close, so I don’t care to
> have that argument), my actual disagreement with your proposal is that it
> is a change that impacts developers and users universally and I don’t think
> that it is justified. Simply put, I don’t feel that the benefits are
> substantial enough to warrant the kind of disruptive change you’re
> proposing.
>
>
> I suggest that saying that all of these developers are "doing it
> wrong" is not helpful.
>
>
> Maybe I’m missing something, but I don’t think I said anyone was “doing it
> wrong”. Bisecting across multiple git repositories isn’t a great
> experience. But neither is bisecting across a half dozen separate folders
> in an SVN repository. Both the submodule solution and the mono-repo
> solution solve this problem equivalently well.
>
> Not everyone has the git and python/bash chops
> to write the necessary scripts.  Not everyone has the personality to
> obsessively script around stuff, or the desire to maintain said
> scripts.  Not everyone works on llvm/clang so much that it's worth
> adopting a special-snowflake workflow.  And some of us -- myself
> included -- have extensive git scripts which work with the standard
> git workflow but would be completely broken by adding a custom level
> of indirection around git.
>
> When put this way, maybe it's clear that it's actually a niche set of
> people for whom "script around the brokenness" is a good solution.
>
>
> I’m not sure what “brokenness” you’re referring to. We have a collection
> of loosely connected projects by design. As a result of that intentional
> design certain workflows will be impacted. I don’t think that is
> brokenness. I think our loose coupling is a feature even if it makes some
> workflows harder.
>
> -Chris
>
>
> As I've said a bunch of times above, we have to weigh a cost paid by
> all of us every time we type a command that starts with "git" --
> something we do tens or hundreds of times a day -- versus the one-time
> cost of asking people to download 1gb of data.
>
> On Wed, Jul 27, 2016 at 9:47 AM, Chris Bieneman via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> I’m just now catching up on this massive thread after being on vacation
> last
> week, and I have a few thoughts I’d like to share.
>
> First and foremost please don’t consider lack of dissent on the thread as
> presence of consensus. The various git-related threads on LLVM-dev lately
> have been so active and contentious that I think a lot of people are zoning
> out on the conversations. As supporting evidence of this, I was discussing
> this thread yesterday around the office yesterday and had quite a few
> people
> responding something along the lines of “they’re proposing what?”.
>
> I think it would be great for us to have several different proposals for
> how
> the git-transition could work, and have a survey to get people’s opinions.
> I
> know this has been discussed repeatedly, and I want to put in my vote in
> favor of having a survey that takes into account multiple different
> approaches.
>
> WRT the actual proposal in this thread, I’m strongly opposed to a
> mono-repository. While I understand the argument that the full clone’s cost
> on disk space is minimal compared to an LLVM object directory, what about
> for contributors that contribute to the smaller runtimes projects but *not*
> to LLVM or Clang. A contributor that only contributes to libcxx or
> compiler-rt being forced to do a full clone of all the LLVM projects in
> order to push a patch kinda sucks.
>
> I want to point out a few workflows people may not be considering.
>
> Clang can be built against an installed LLVM. I know this workflow is used
> by some people because I’ve broken it in the past and had to fix it. With a
> mono-repo this workflow gets a bit more complicated because you’d need to
> do
> sparse checkouts, and it probably means we should just nuke the workflow
> entirely because there is no real value added by having it.
>
> Compiler-RT’s sanitizers are used with GCC; no LLVM required. While for the
> common use case maintaining sparse repository mirrors would limit impact of
> this on users, should any GCC user want to contribute to Compiler-RT,
> you’re
> forcing them to clone a much larger repository than necessary.
>
> The same problem with Compiler-RT’s sanitizers also applies to libcxx,
> libcxxabi, libunwind, and potentially any other runtime library projects
> that we may create in the future.
>
> Beyond all that I want to point out that the git multi-repository story is
> basically the same thing we have today with SVN except for the absence of a
> monotonically increasing number that corresponds across repositories. While
> admittedly you do get a linear history with using the mono-repository, that
> isn’t the only way to solve the problem, and I don’t really think that the
> benefit (not needing to write some tooling) justifies the increased burden
> applied to contributors that don’t use the full LLVM family of projects.
>
> I think we have some pretty strong evidence in the form of the github fork
> counts (https://github.com/llvm-mirror/) that most people aren’t using all
> of the LLVM projects. In fact, by that evidence Clang (the second most
> popular project) is forked less than 2/3 as many times as LLVM.
>
> -Chris
>
>
> On Jul 26, 2016, at 11:31 AM, Renato Golin via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> On 26 July 2016 at 19:28, Sanjoy Das via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>
> Even if it were possible, I would still keep my upstream checkout
> separate just as a safety measure, to keep from sending private stuff
> upstream by accident.
>
>
> Just FYI, this is our (Azul's) workflow as well, and for similar
> reasons.
>
>
> Same here.
>
> cheers,
> --renato
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160808/60d7d1b2/attachment-0001.html>


More information about the llvm-dev mailing list