[llvm-dev] Add more projects into Git monorepo

Tue May 9 08:17:58 PDT 2017

On 9 May 2017, at 15:58, Mehdi AMINI <joker.eph at gmail.com> wrote:

> I'd expect any CI system to be able to cache this.
> Also if you're issue is archiving a lot of build artifacts, the constant cost of the checkout isn't gonna matter that much.
> Finally, the read-only individual repo can still be used by CI, which address this entirely.

If we want to pull in new libunwind fixes from upstream, we’ll also pull in irrelevant LLVM, clang, lldb, lld, and so on changes.  This translates to extra bandwidth and storage requirements for *every* copy of the libunwind repo that we need.

If we follow the monorepo approach downstream and merge these independent repos, then we add extra merges for everyone downstream because people committing improvements to our LLVM and clang trees will require rebase pulls for anyone working on libc++ or libunwind, even though the changes were to a component that they’re not needing to build, let alone modify.

> There is another philosophical perspective: encouraging communities to get closer together. You talking about "libunwind developers", and there are "lldb developers" as well, I rather get closer to: "we're working on the same project", with shared practices and goals. And ultimately, to come back to your software engineering practices, encouraging code motion and code reuse between sub-projects.

I disagree, as someone who wears hats as a libunwind, libc++, clang and LLVM developer: I am no more engaged between the different groups by having the repos combined, but I am inconvenienced by having to carry around clones of unrelated code when I am working on one component and by having to rebase my libunwind repo because someone committed to clang.

Combining the clang and LLVM repos is a necessary evil.  If we could have clean layering and well-defined APIs for the LLVM APIs needed for clang, then I would be opposed to this as well, but unfortunately this has too high an engineering cost and so we need to be able to perform atomic commits of LLVM and LLVM-using projects (this, unfortunately, means that we often don’t see the cost that this imposes on developers of other front ends).  In contrast, if we need to perform an atomic commit between libc++ and clang or libunwind and clang then this tells us that we have a bug: a new version of clang may introduce a feature that relies on a new libc++ or libunwind, but a new libunwind or libc++ should always work with an old clang (or an old gcc, or any other compiler that targets it).

>> All of this applies to libc++ and libc++abi as well.
> 
> Ultimately I don't know about libunwind, and if it has to live separately it is not a big deal. The others (libc++ and libc++abi for instance) are more tied to the rest of the project though.
> We duplicate the demangler from libc++abi in llvm for instance, and this is quite an important software engineer issue to me.

The requirements for a libc++abi demangler and a generic LLVM one are very different.  For libc++abi, the requirements are:

 - Must be small (the binary size of libc++abi is very important)

 - Must be tolerant of out-of-memory conditions (it is used for generating error messages when an out-of-memory exception is thrown)

 - Must use malloc() / realloc() for providing the demangled string (a requirement of the Itanium ABI public APIs)

In contrast, the demangler for the rest of LLVM:

 - Must be flexible (e.g. lldb wants to be able to get the base name of a demangled function, so that it can insert breakpoints on all overloads)

 - Must be fast (e.g. lldb wants to demangle every symbol in a library in a UI-critical path)

 - Must provide structured information about the demangled symbol, not just a string as output.

 - Must integrate with other memory allocation mechanisms (e.g. support std::allocator)

Copying the demangler was a quick way of getting something to work portably, but it wasn’t a good solution given the different requirements (the libc++abi demangler doesn’t do a good job of meeting either set of requirements), so this is a very bad justification for merging the repos.

David