[llvm-dev] A libc in LLVM

Thu Jun 27 05:26:51 PDT 2019

[ I have worked on FreeBSD libc, so a few clarifications here: ]

On 26/06/2019 17:02, Andrew Kelley via llvm-dev wrote:
> Finally, I'm only aware of 2 operating systems where the libc is not an
> integral part of the system, which is Linux and Windows. For example on
> macOS, FreeBSD, OpenBSD, and DragonFlyBSD, the libc is guaranteed to be
> available, and must be dynamically linked, because this is the stable
> syscall ABI.

Solaris and macOS (kind-of) belong on this list, but FreeBSD does not 
and I don't believe other BSDs do, though the situation is somewhat more 
complex.  On FreeBSD, the system call ABI is stable and there are compat 
layers that allow foreign or legacy system call interfaces to be exposed 
to userspace processes (e.g. a FreeBSD 7 system call table on FreeBSD 
12, or a Linux system call table on any FreeBSD.  The Capsicum sandbox 
mode is also implemented in part by pivoting the system call layer: once 
you call cap_enter, some system calls are simply not exposed to you at 
all).

There is even CloudABI, which uses a mostly musl-derived libc and a 
Capsicum-derived system call table.  This is used for statically linked 
applications with a custom launcher that gives strong security guarantees.

That said, the relationship between FreeBSD's libc, libthr (pthreads) 
and rtld are quite complex, as are their interactions with the kernel. 
Supporting dlopening libthr turned out to be incredibly hard to support 
in practice, but even without that, there is some complexity from the 
fact that libc must allow libthr to preempt a number of its symbols (and 
must provide implementations of things like pthread_mutex for programs 
that do not start threads).  In the 5.x time frame, we did support two 
different pthreads implementations.  This was, in hindsight, an 
absolutely terrible idea and not something that I'd ever recommend 
anyone do ever again.

On macOS, libSystem is actually the public interface to the kernel, so 
you can bring along your own libc if you want to, you just have to 
dynamically link to libSystem to get access to system calls (or you do 
what Go did, try to make them without going via libSystem, and watch 
every single program written in your language die when the kernel's 
gettimeofday interface changes...).  This; however, makes it effectively 
impossible to difficult to bring your own dyld replacement to macOS, 
because it must be able to load libSystem without making any system calls...

> So it would only make sense for an LLVM libc to be for
> Linux and Windows. It seems reasonable to assume that Google is only
> interested in Linux. In this case I have to re-iterate my original
> question, what are the needs that are not being met by existing Linux
> libcs, such as musl?

I am also unconvinced that it is possible to design a clean platform 
abstraction layer for libc that would work over even Linux and FreeBSD 
without imposing significant penalties for one or the other.  If you add 
Windows into the mix, then it gets a lot harder.  POSIX's decision to 
use int, rather than a pointer type, for file descriptors and to make 
specific guarantees about reuse order (rather than just providing dup2 
as a moderately sane interface) means that userspace code will need to 
implement the file descriptor table.  Do we build higher-level layering 
on top of file descriptors or do we support Windows HANDLEs natively for 
internal usage and use fds only for public APIs?

The idea of an LLVM libc has been proposed a few times and generally the 
pushback has been that it doesn't make sense because libc is so 
intimately tied to the host kernel that it's very hard to consider it as 
a portable component.

David