[llvm-dev] [RFC] One or many git repositories?

Thu Jul 28 11:28:13 PDT 2016

> On Jul 28, 2016, at 10:53 AM, Justin Lebar <jlebar at google.com> wrote:
> 
> Thanks again for your thoughts, Chris.
> 
>> As a straw man I would suggest the following criteria for inclusion into the mono-repo:
>> 
>> (1) Projects in the mono-repo must be tightly coupled to specific versions or commits of other projects in the mono-repo
> 
> I'm fine with that, fwiw.  That was in fact the original proposal.

That is the wording of the original proposal, but I disagree that it is the content of the original proposal. I don’t believe that Compiler-RT is tightly coupled to LLVM at all, which is a big source of my disagreement here.

> I'm also fine if we decide to put everything inside the monorepo.  I
> think Richard Smith had some good arguments for why they belong
> together.
> 
> But I am really surprised that you think this is such a big deal that
> you would object to the whole monorepo if this decision doesn't go
> your way.  

I really hate your phrasing on this. I’m not objecting to this proposal just because some minor decision doesn’t go my way. I think this is a very crucial point of whether or not the monorepo solution’s benefit outweighs its cost.

> The decision of whether or not to include these projects
> affects only read-write consumers of these projects -- of which there
> are relatively few people.

Maybe there are few, but the impact is non-insignificant. Also I think the opinions of the read-write consumers of the sub-projects being included should count for a lot, and as a read-write consumer I don’t like this proposal if it includes the runtime libraries.

>  Read-only consumers *are entirely
> unaffected by the decision*, as they can continue to use the read-only
> subproject mirrors exactly as today.

The existence of subproject mirrors requires someone to write and maintain the tooling to keep those mirrors updated, and those mirrors will have all the technical hurdles and drawbacks that a submodule repository would have.

The question here is: Do you make downstream single project users work off potentially unreliable mirrors, or do you make the people who need a mono-repo experience work off a potentially unreliable submodule repo?

I think the only answer anyone can reasonably give to this is that we don’t have enough information to make a reasonable decision that maximizes the benefits to most users while minimizing the adverse impacts. Hence why I keep saying we need a survey to understand how *people* interact with the project and what kinds of workflows are important. I emphasize the word “people” in that last sentence because this decision impacts the contributors to the community, and downstream users. We need to take all perspectives into account when making this kind of infrastructure decision.

> 
>> (2) The projects in the mono-repo most provide wide benefit to the community such that the overall community benefit outweighs the impacts of the project being in the repo
>> (3) Projects in the mono-repo must conform to some defined set of standards. LLVM’s coding standards might be a bit much, but something along those lines.
> 
> Would you mind explaining why you think the criteria for inclusion in
> the monorepo should be different than the criteria for inclusion as an
> LLVM subproject?

For starters, including things as LLVM subproject doesn’t require that they meet criteria #1 in my proposal. Simply put, they don’t need to be tightly coupled to LLVM. We have many examples of that.

> 
> I think these are fine criteria -- for inclusion of code as an LLVM
> subproject.  But it seems to me -- and maybe I'm wrong -- that the
> reason you're proposing them is that there exist today LLVM
> subprojects that are version-locked to other projects but you think do
> not meet these criteria, and therefore you want to exclude them from
> the monorepo.  Is that right?  lldb comes to mind, as it wasn't in
> your list above.
> 
> I understand that lldb is persona non grata in some circles.  But.
> It's not right to use the source code migration as a tool to revisit
> an old decision like this.  That is procedurally unjust.  The relevant
> decision should be, "is LLDB an LLVM subproject that is version-locked
> to other subprojects, or not?”

I really don’t want to debate LLDB. It is a hot issue for a lot of people, and I’d really prefer if we didn’t start a “let’s all rag on lldb” thread.

Instead, let’s talk about DragonEgg. The DragonEgg project is, as far as I can tell, abandoned, but it is still an LLVM project that is tightly coupled to LLVM versions. So it meets criteria #1. I think it fails to meet criteria #2 because DragonEgg is basically abandoned and provides no real value to the community. Even though the burden of a dead project on the mono-repo is minuscule, I think there is no good reason to include DragonEgg.

Do you disagree?

> 
> If you feel strongly that we should reevaluate every project on the
> basis of these last two criteria before including them in the
> monorepo, would you mind elaborating on what exactly are the harms of
> including a project that isn't up to snuff?

Every project that is added to the mono-repo will incur a small cost to developers in terms of the size it adds to the repository, and the tooling or workflow adjustments to handle the change. In most cases this will be minimal, even negligible. However I think the burden on runtime developers is significant.

>  If you are aesthetically
> displeased by a project, you can hide it using sparse checkouts.  And
> nobody is going to make you build it.  At that point, the only cost I
> can think of from including a project is the bytes on disk.  But since
> the full history of all LLVM subprojects (excluding test-suite) is
> 500mb (*), surely you're not going to argue for the exclusion of (say)
> lldb on the grounds of saving 25mb (or whatever)?

I won’t argue over lldb at all. My arguments are from the perspective of someone working on the runtime library projects, the burden is significant to be included in the llvm mono-repo. While the full history of LLVM is around 500MB, the full history of *all* the runtime projects is less than 100MB. Developers working on libcxx or compiler-rt should not need to clone LLVM, and run commands to do sparse checkouts. That is more burden than we should incur. Further the setup cost of doing multiple sparse checkouts in order to approximate the workflows we have today with decoupled projects is, IMO, unnecessary and unreasonable.

Those arguments go away if you follow criteria that exclude runtime projects from the mono-repo.

-Chris

> 
> -Justin
> 
> (*) I'd called it 1.2gb before, but Bruce Hoult set me straight.
> 
> On Thu, Jul 28, 2016 at 10:21 AM, Chris Bieneman via llvm-dev
> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On Jul 28, 2016, at 12:59 AM, Renato Golin via llvm-dev
>> <llvm-dev at lists.llvm.org> wrote:
>> 
>> On 28 Jul 2016 8:36 a.m., "David Chisnall via llvm-dev"
>> <llvm-dev at lists.llvm.org> wrote:
>>> This does not apply to libc++.  We support building the entire LLVM suite
>>> with other C++ standard library implementations (at least libstdc++, and I
>>> think also with Visual Studio’s implementation), so there is no dependency
>>> of anything on libc++.  Similarly, we support building libc++ with other
>>> compilers (in FreeBSD, we currently build it with gcc 6.1 for RISC-V, for
>>> example, where the LLVM toolchain is not quite useable).
>>> 
>>> The same applies to libunwind, to an even greater degree (where libc++
>>> implements a standard API, libunwind implements a standard ABI).
>> 
>> I think the dependencies of lib* in LLVM are more conceptual than version
>> lock, but they're still there.
>> 
>> I agree with you in all other points, mind you, but RT needs an unwind
>> library as much as it needs clang. Without them, RT "can" (and indeed does)
>> work, but we're not providing a complete solution.
>> 
>> I won't *push* to bundle libunwind, libcxxabi (and ultimately libcxx) on
>> those merits alone, but my opinion is that we should. I can't see much use
>> in RT without them. That's why we're still defaulting to libgcc on Linux.
>> 
>> Renato, I just want to point out that the Compiler-RT story is *WAY* more
>> complicated than it might seem from your comments here. Compiler-RT is
>> really two or three conceptually different things that happen to be in the
>> same project, and parts of it are very useful without libunwind, libcxxabi,
>> and libcxx.
>> 
>> For example, the Compiler-RT sanitizers are used with GCC and libgcc. They
>> can be built to be used with libstdc++ as well as libc++ (although I do
>> think that loses some features).
>> 
>> I would not object to a mono-repo that included LLVM, Clang, LLD, and
>> Clang-Tools-Extra. I strongly object to any mono-repo that includes any of
>> the runtime library projects. I also think that once you move away from the
>> “mono-repo including all” you need to identify criteria for how you
>> determine which projects get included, and potentially how you evaluate
>> adding projects to the mono-repo.
>> 
>> As a straw man I would suggest the following criteria for inclusion into the
>> mono-repo:
>> 
>> (1) Projects in the mono-repo must be tightly coupled to specific versions
>> or commits of other projects in the mono-repo
>> (2) The projects in the mono-repo most provide wide benefit to the community
>> such that the overall community benefit outweighs the impacts of the project
>> being in the repo
>> (3) Projects in the mono-repo must conform to some defined set of standards.
>> LLVM’s coding standards might be a bit much, but something along those
>> lines.
>> 
>> Thoughts?
>> 
>> -Chris
>> 
>> My tuppence.
>> 
>> Cheers,
>> Renato
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>> 
>> 
>> 
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>