[llvm-dev] [RFC] One or many git repositories?

Thu Jul 28 12:05:37 PDT 2016

>> The decision of whether or not to include these projects
>> affects only read-write consumers of these projects -- of which there
>> are relatively few people.
>
> Maybe there are few, but the impact is non-insignificant. Also I think the opinions of the read-write consumers of the sub-projects being included should count for a lot

I agree.

> as a read-write consumer I don’t like this proposal if it includes the runtime libraries.

Point well-taken.

> The existence of subproject mirrors requires someone to write and maintain the tooling to keep those mirrors updated,

I think you will find on this thread no shortage of people willing to
maintain said mirrors in exchange for getting a monorepo as the
canonical source of truth.

> and those mirrors will have all the technical hurdles and drawbacks that a submodule repository would have.

I don't understand this.  The point of the mirrors is to allow people
to use a read-only multirepo workflow.  I agree that if one chose to
do so, one would bite all of the drawbacks of a multirepo workflow,
but...that's the point?  Maybe I'm missing something.

> The question here is: Do you make downstream single project users work off potentially unreliable mirrors, or do you make the people who need a mono-repo experience work off a potentially unreliable submodule repo?

I agree with the gist of this question, but I want to refine the
trade-off a bit.

With a monorepo, downstream single-project users actually have two
options.  They can work off the mirrors, or they can just download the
whole thing.  So with the monorepo, downstream single-project users
are not forced to work off noncanonical mirrors.  They are only
"forced" to do so if they are unable or unwilling to download a 500mb
repo and throw away most of it.  Which I think may actually be
relatively few people.  But what do I know?

Anyway my answer to this question has been and still is, that a
monorepo is strictly more powerful than a multirepo.

For one thing, we can atomically commit across subprojects using a
monorepo.  On IRC I've had a bunch of people just begging me for this.

Putative scripts that allow monorepo users to commit to the multirepo
would not be able to translate cross-cutting commits into a single
commit in the umbrella repository without cooperation from the script
that translates commits to the multirepos into commits in the umbrella
repository (that's the one that contains all the multirepos as git
subrepositories).  It's possible -- it's turing complete --, but it
would be very complicated.

Still more complicated would be writing a script that would allow
monorepo users to push to putative try bots that are based off the
multirepo.  Again anything is possible, but I have written and
maintained similar software in the past (for a significantly simpler
setup) and it was fragile as heck, and again this is going to require
extensive cooperation between us and the multirepo --> umbrella repo
script.

In contrast, as discussed earlier, if people want a multirepo-like
setup based on the monorepo, we can reduce this to a single command
run once when the repository is cloned.  It ends up being far less
fragile, and requiring far fewer (actually, zero) tricks on the server
side.

> Instead, let’s talk about DragonEgg.

+1.

> The DragonEgg project is, as far as I can tell, abandoned, but it is still an LLVM project that is tightly coupled to LLVM versions. So it meets criteria #1. I think it fails to meet criteria #2 because DragonEgg is basically abandoned and provides no real value to the community. Even though the burden of a dead project on the mono-repo is minuscule, I think there is no good reason to include DragonEgg.

If DragonEgg is abandoned, I think we should keep the history in our
repository and just delete it from head.

My argument for keeping it in our history is: Suppose we go with a
monorepo, and suppose at some point in the future, some other LLVM
project -- say, lld -- became abandoned.  Would we rewrite our
monorepo history to erase all trace of lld, because it no longer
provides value to us?

No, right?  lld's history is part of our history.  We'd just delete it
from head and move on with our lives.

> My arguments are from the perspective of someone working on the runtime library projects, the burden is significant to be included in the llvm mono-repo. While the full history of LLVM is around 500MB, the full history of *all* the runtime projects is less than 100MB.  Developers working on libcxx or compiler-rt should not need to clone LLVM, and run commands to do sparse checkouts. That is more burden than we should incur. Further the setup cost of doing multiple sparse checkouts in order to approximate the workflows we have today with decoupled projects is, IMO, unnecessary and unreasonable.

OK, just to make sure I understand your point here, because this is
important, you are saying that you object to including libcxx and
compiler-rt in the llvm monorepo because:

* It would consume an additional ~400mb of disk space, and
* It's unnecessary and unreasonable to ask libcxx etc. developers to
run a script when they check out the monorepo if they want a sparse
checkout and/or a setup that mirrors the multirepo.

I'm not trying to put words in your mouth or subtly change what you're
saying, so please let me know if I didn't get that right.

Thanks again for all your time here.

-Justin

On Thu, Jul 28, 2016 at 11:28 AM, Chris Bieneman <beanz at apple.com> wrote:
>
>> On Jul 28, 2016, at 10:53 AM, Justin Lebar <jlebar at google.com> wrote:
>>
>> Thanks again for your thoughts, Chris.
>>
>>> As a straw man I would suggest the following criteria for inclusion into the mono-repo:
>>>
>>> (1) Projects in the mono-repo must be tightly coupled to specific versions or commits of other projects in the mono-repo
>>
>> I'm fine with that, fwiw.  That was in fact the original proposal.
>
> That is the wording of the original proposal, but I disagree that it is the content of the original proposal. I don’t believe that Compiler-RT is tightly coupled to LLVM at all, which is a big source of my disagreement here.
>
>> I'm also fine if we decide to put everything inside the monorepo.  I
>> think Richard Smith had some good arguments for why they belong
>> together.
>>
>> But I am really surprised that you think this is such a big deal that
>> you would object to the whole monorepo if this decision doesn't go
>> your way.
>
> I really hate your phrasing on this. I’m not objecting to this proposal just because some minor decision doesn’t go my way. I think this is a very crucial point of whether or not the monorepo solution’s benefit outweighs its cost.
>
>> The decision of whether or not to include these projects
>> affects only read-write consumers of these projects -- of which there
>> are relatively few people.
>
> Maybe there are few, but the impact is non-insignificant. Also I think the opinions of the read-write consumers of the sub-projects being included should count for a lot, and as a read-write consumer I don’t like this proposal if it includes the runtime libraries.
>
>>  Read-only consumers *are entirely
>> unaffected by the decision*, as they can continue to use the read-only
>> subproject mirrors exactly as today.
>
> The existence of subproject mirrors requires someone to write and maintain the tooling to keep those mirrors updated, and those mirrors will have all the technical hurdles and drawbacks that a submodule repository would have.
>
> The question here is: Do you make downstream single project users work off potentially unreliable mirrors, or do you make the people who need a mono-repo experience work off a potentially unreliable submodule repo?
>
> I think the only answer anyone can reasonably give to this is that we don’t have enough information to make a reasonable decision that maximizes the benefits to most users while minimizing the adverse impacts. Hence why I keep saying we need a survey to understand how *people* interact with the project and what kinds of workflows are important. I emphasize the word “people” in that last sentence because this decision impacts the contributors to the community, and downstream users. We need to take all perspectives into account when making this kind of infrastructure decision.
>
>>
>>> (2) The projects in the mono-repo most provide wide benefit to the community such that the overall community benefit outweighs the impacts of the project being in the repo
>>> (3) Projects in the mono-repo must conform to some defined set of standards. LLVM’s coding standards might be a bit much, but something along those lines.
>>
>> Would you mind explaining why you think the criteria for inclusion in
>> the monorepo should be different than the criteria for inclusion as an
>> LLVM subproject?
>
> For starters, including things as LLVM subproject doesn’t require that they meet criteria #1 in my proposal. Simply put, they don’t need to be tightly coupled to LLVM. We have many examples of that.
>
>>
>> I think these are fine criteria -- for inclusion of code as an LLVM
>> subproject.  But it seems to me -- and maybe I'm wrong -- that the
>> reason you're proposing them is that there exist today LLVM
>> subprojects that are version-locked to other projects but you think do
>> not meet these criteria, and therefore you want to exclude them from
>> the monorepo.  Is that right?  lldb comes to mind, as it wasn't in
>> your list above.
>>
>> I understand that lldb is persona non grata in some circles.  But.
>> It's not right to use the source code migration as a tool to revisit
>> an old decision like this.  That is procedurally unjust.  The relevant
>> decision should be, "is LLDB an LLVM subproject that is version-locked
>> to other subprojects, or not?”
>
> I really don’t want to debate LLDB. It is a hot issue for a lot of people, and I’d really prefer if we didn’t start a “let’s all rag on lldb” thread.
>
> Instead, let’s talk about DragonEgg. The DragonEgg project is, as far as I can tell, abandoned, but it is still an LLVM project that is tightly coupled to LLVM versions. So it meets criteria #1. I think it fails to meet criteria #2 because DragonEgg is basically abandoned and provides no real value to the community. Even though the burden of a dead project on the mono-repo is minuscule, I think there is no good reason to include DragonEgg.
>
> Do you disagree?
>
>>
>> If you feel strongly that we should reevaluate every project on the
>> basis of these last two criteria before including them in the
>> monorepo, would you mind elaborating on what exactly are the harms of
>> including a project that isn't up to snuff?
>
> Every project that is added to the mono-repo will incur a small cost to developers in terms of the size it adds to the repository, and the tooling or workflow adjustments to handle the change. In most cases this will be minimal, even negligible. However I think the burden on runtime developers is significant.
>
>>  If you are aesthetically
>> displeased by a project, you can hide it using sparse checkouts.  And
>> nobody is going to make you build it.  At that point, the only cost I
>> can think of from including a project is the bytes on disk.  But since
>> the full history of all LLVM subprojects (excluding test-suite) is
>> 500mb (*), surely you're not going to argue for the exclusion of (say)
>> lldb on the grounds of saving 25mb (or whatever)?
>
> I won’t argue over lldb at all. My arguments are from the perspective of someone working on the runtime library projects, the burden is significant to be included in the llvm mono-repo. While the full history of LLVM is around 500MB, the full history of *all* the runtime projects is less than 100MB. Developers working on libcxx or compiler-rt should not need to clone LLVM, and run commands to do sparse checkouts. That is more burden than we should incur. Further the setup cost of doing multiple sparse checkouts in order to approximate the workflows we have today with decoupled projects is, IMO, unnecessary and unreasonable.
>
> Those arguments go away if you follow criteria that exclude runtime projects from the mono-repo.
>
> -Chris
>
>>
>> -Justin
>>
>> (*) I'd called it 1.2gb before, but Bruce Hoult set me straight.
>>
>> On Thu, Jul 28, 2016 at 10:21 AM, Chris Bieneman via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>>>
>>> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev
>>> <llvm-dev at lists.llvm.org> wrote:
>>>
>>> On 28 Jul 2016 8:36 a.m., "David Chisnall via llvm-dev"
>>> <llvm-dev at lists.llvm.org> wrote:
>>>> This does not apply to libc++.  We support building the entire LLVM suite
>>>> with other C++ standard library implementations (at least libstdc++, and I
>>>> think also with Visual Studio’s implementation), so there is no dependency
>>>> of anything on libc++.  Similarly, we support building libc++ with other
>>>> compilers (in FreeBSD, we currently build it with gcc 6.1 for RISC-V, for
>>>> example, where the LLVM toolchain is not quite useable).
>>>>
>>>> The same applies to libunwind, to an even greater degree (where libc++
>>>> implements a standard API, libunwind implements a standard ABI).
>>>
>>> I think the dependencies of lib* in LLVM are more conceptual than version
>>> lock, but they're still there.
>>>
>>> I agree with you in all other points, mind you, but RT needs an unwind
>>> library as much as it needs clang. Without them, RT "can" (and indeed does)
>>> work, but we're not providing a complete solution.
>>>
>>> I won't *push* to bundle libunwind, libcxxabi (and ultimately libcxx) on
>>> those merits alone, but my opinion is that we should. I can't see much use
>>> in RT without them. That's why we're still defaulting to libgcc on Linux.
>>>
>>> Renato, I just want to point out that the Compiler-RT story is *WAY* more
>>> complicated than it might seem from your comments here. Compiler-RT is
>>> really two or three conceptually different things that happen to be in the
>>> same project, and parts of it are very useful without libunwind, libcxxabi,
>>> and libcxx.
>>>
>>> For example, the Compiler-RT sanitizers are used with GCC and libgcc. They
>>> can be built to be used with libstdc++ as well as libc++ (although I do
>>> think that loses some features).
>>>
>>> I would not object to a mono-repo that included LLVM, Clang, LLD, and
>>> Clang-Tools-Extra. I strongly object to any mono-repo that includes any of
>>> the runtime library projects. I also think that once you move away from the
>>> “mono-repo including all” you need to identify criteria for how you
>>> determine which projects get included, and potentially how you evaluate
>>> adding projects to the mono-repo.
>>>
>>> As a straw man I would suggest the following criteria for inclusion into the
>>> mono-repo:
>>>
>>> (1) Projects in the mono-repo must be tightly coupled to specific versions
>>> or commits of other projects in the mono-repo
>>> (2) The projects in the mono-repo most provide wide benefit to the community
>>> such that the overall community benefit outweighs the impacts of the project
>>> being in the repo
>>> (3) Projects in the mono-repo must conform to some defined set of standards.
>>> LLVM’s coding standards might be a bit much, but something along those
>>> lines.
>>>
>>> Thoughts?
>>>
>>> -Chris
>>>
>>> My tuppence.
>>>
>>> Cheers,
>>> Renato
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>>
>>>
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>