[llvm-dev] A libc in LLVM

Rich Felker via llvm-dev llvm-dev at lists.llvm.org
Fri Jul 12 12:29:07 PDT 2019


On Fri, Jul 12, 2019 at 02:54:40PM -0400, Siva Chandra wrote:
> On Fri, Jun 28, 2019 at 9:29 AM JF Bastien <jfbastien at apple.com>
> wrote:
> >
> > I think I now understand some of the disconnect you and I are
> > having, and I think some of the pushback you’re getting from the
> > community is the same. You’re talking about where you want to start
> > with an LLVM libc. Many in the community (myself included) want to
> > understand where we’ll get with this libc. At steady-state, what
> > does it do? To a certain degree I don’t care about how you get to
> > the steady state: sure the implementation approach is important, and
> > which contributor cares about what parts is important in shaping
> > that evolution, but at the end of the day what matters is where you
> > get.
> >
> > So here’s what’s missing: there’s no goal. Right now, your proposal
> > is “let’s do an LLVM libc, starting with what I care about, who’s
> > interested?”
> >
> > That’s an OK place to start! You illustrated your needs, others
> > chimed in with theirs, and now you know there’s some interest.
> > However, you should take time now to come up with a plan. What’s
> > this libc actually going to be? I ask a bunch of questions below
> > that I think you need to answer as a next step. Others asked more
> > questions which I didn’t echo, but which you should answer as well.
> > What does this libc aspire to become?
> 
> I apologize for the delay. I will try to address the above questions
> in this email. I will shortly follow up with answers to other
> questions.
> 
> Below is a write up which I think would qualify as the "charter" for
> the new libc. It is also answering questions like, "where we’ll get
> with this libc?", "what's this libc actually going to be?" and similar
> ones. I have used libcxx.llvm.org landing page as a template to write
> it down.
> 
> ###############################################
> 
> "llvm-libc" C Standard Library
> ========================
> 
> llvm-libc is an implementation of the C standard library targeting C11
> and above. It also provides platform specific extensions as relevant.
> For example, on Linux it also provides pthreads, librt and other POSIX
> extension libraries.
> 
> Documentation
> ============
> 
> The llvm-libc project is still in the planning phase. Stay tuned for
> updates soon.
> 
> Features and Goals
> ================
> 
> * C11 and upwards conformant.
> * A modular libc with individual pieces implemented in the "as a
> library" philosophy of the LLVM project.

> * Ability to layer this libc over the system libc.

I don't think this is really possible, without tooling designed
specifically to do it (remapping symbols, etc.), which clang/LLVM,
beign the compiler/tooling, *could* do. But even with the right
tooling, you're going to find that it's *a lot* harder than you
expect, likely almost impossible without making assumptions about the
internals of the underlying libc that are not public contracts.

> * Provide C symbols as specified by the standards, but take advantage
> and use C++ language facilities for the core implementation.

This was done in Fuchsia's fork of musl too, and was one of my major
criticisms of it -- makes no sense except satisfying developers who
want to use C++ for the sake of it being C++. It's very hard to do
"freestanding" C++ that doesn't even rely on the underlying libc, and
if you do rely on libc, it's a circular dependency. Moreover there's
really *very little* in libc that benefits from C++ (much less a pure
freestanding C++ with no library) for implementing it. And there's
huge potential to get things wrong by using C++ in ways that have
hidden failure cases/exceptions in places where the C interface you're
implementing either cannot be allowed to fail, or where introducing
the possibility of failure would be a huge QoI flaw.

> * Provides POSIX extensions on POSIX compliant platforms.
> * Provides system-specific extensions as appropriate. For example,
> provides the Linux API on Linux.
> * Vendor extensions if and only if necessary.
> * Designed and developed from the start to work with LLVM tooling and
> testing like fuzz testing and sanitizer-supported testing.
> * ABI independent implementation as far as possible.
> * Use source based implementations as far possible rather than
> assembly. Will try to “fix” the compiler rather than use assembly
> language workarounds.
> 
> Why a new C Standard Library?
> =========================
> 
> Implementing a libc is no small task and is not be taken lightly. A

Indeed.

> natural question to ask is, "why a new implementation of the C
> standard library?" There is no single answer to this question, but
> some of the major reasons are as follows:
> 
> * Most libc implementations are monolithic. It is a non-trivial
> porting task to pick and choose only the pieces relevant to one's
> platform. The new libc will be developed with sufficient modularity to
> make picking and choosing a straightforward task.

Have you given any thought to what it would mean to make this kind of
porting practical? The reason we haven't done it in musl is because
it's highly nontrivial. You have to either find an existing point
amenable to abstraction that's reasonably common to all existing
systems and hope it will apply to future ones too -- for musl, this
means the concept of syscalls, which are presently assumed to be Linux
ones but could be abstracted *somewhat* further, and might be in the
future.

If you can't find a suitable point amenable to abstraction that
encompasses everything you want to support, then instead you end up
making your own abstraction layer in between, and now you're stuck
with the task of porting your abstraction layer to every new system
you want to support. Now you have an extra layer of bloat, and haven't
saved any significant amount of porting work.

All of this aside, I agree that it would be rather nice to be
"non-monolithic", especially for the parts of libc that are "pure
library code" (not depending on any underlying system facilities) to
be kept separate and easy to reuse in ports to weird/bare-metal/etc.
systems.

It'd also be nice for things like stdio that do depend on a system
facility, but where the underlying system facility is understood to be
"common" at a higher level than syscalls (actual functions on fds) to
be able to work with arbitrary implementations of the underlying
functions. The reason we didn't do this from the beginning in musl is
namespacing; plain C symbols can't depend on symbols in the POSIX
namespace.

> * Most libc implementations break when built with sanitizer specific
> compiler options. The new libc will be developed from the start to
> work with those specialized compiler options.

This is a nice goal but invites all sorts of circular dependency
problems. At some point this will likely be possible with musl too,
with the exception of certain components that need to operate at early
entry time.

> * The new libc will be developed to support and employ fuzz testing
> from the start.

> * Most libc implementations use a good amount of assembly language,
> and assume specific ABIs (may be platform dependent). With the new
> libc implementation, we want to use normal source code as much as
> possible so that compiler-based changes to the ABI are easy. Moreover,
> as part of the LLVM project, we want to use this opportunity to fix
> performance related compiler bugs rather than using assembly
> workarounds.

This is particularly wrong about musl, where use of asm (especially
extern asm files as opposed to inline asm) is mostly limited to places
where something fundamentally can't be implemented without asm. We
don't use asm as a workaround for poor compiler codegen, unless you
count things like single-instruction math functions, where it would be
really hard for a compiler to pattern-recognize the whole function and
reduce it down to the instruction. (Note also that use of __builtin_*
doesn't help here because it can create circular definitions if the
compiler chooses not to inline the single instruction.)

> * A large hole in the llvm toolchain will be plugged with this new
> libc.

I read this as a confirmation of my concerns from my previous post and
tweets, that this looks like you're trying to make "LLVM libc" (or
rather "Google libc") the first-class libc for use with clang/LLVM,
radically altering the boundaries between tooling and platform, and
relegating the existing libc implementations on LLVM's targets to
second-class.

If this is not the case, can you explain what guarantees we have that
this is not what's going on?

> With the broad platform expertise in the LLVM community, and the
> strong license and project structure, we think that the new libc will
> be more tunable and robust, without sacrificing the simplicity and
> accessibility typical of the LLVM project.

Tunable and robust are usually opposites; see also: uclibc.

In summary, I think you're still massively underestimating what an
undertaking this is, mistaken about various choices/tradeoffs and
whether they make sense, and either not thinking about consequences on
ecosystem/monoculture of tightly coupling library with tooling, or
intentionally trying to bring about those consequences, contrary to
what I see as the best interests of the communities affected.

Rich


More information about the llvm-dev mailing list