[llvm-dev] A libc in LLVM

Thu Jun 27 11:53:01 PDT 2019

On Mon, Jun 24, 2019 at 3:23 PM Siva Chandra via llvm-dev <
llvm-dev at lists.llvm.org> wrote:

> Hello LLVM Developers,
>
> Within Google, we have a growing range of needs that existing libc
> implementations don't quite address. This is pushing us to start working on
> a new libc implementation.
>
> Informal conversations with others within the LLVM community has told us
> that a libc in LLVM is actually a broader need, and we are increasingly
> consolidating our toolchains around LLVM. Hence, we wanted to see if the
> LLVM project would be interested in us developing this upstream as part of
> the project.
>
> To be very clear: we don't expect our needs to exactly match everyone
> else's -- part of our impetus is to simplify things wherever we can, and
> that may not quite match what others want in a libc. That said, we do
> believe that the effort will still be directly beneficial and usable for
> the broader LLVM community, and may serve as a starting point for others in
> the community to flesh out an increasingly complete set of libc
> functionality.
>
> We are still in the early stages, but we do have some high-level goals and
> guiding principles of the initial scope we are interested in pursuing:
>
>
>    1.
>
>    The project should mesh with the "as a library" philosophy of the LLVM
>    project: even though "the C Standard Library" is nominally "a library,"
>    most implementations are, in practice, quite monolithic.
>    2.
>
>    The libc should support static non-PIE and static-PIE linking. This
>    means, providing the CRT (the C runtime) and a PIE loader for static
>    non-PIE and static-PIE linked executables.
>    3.
>
>    If there is a specification, we should follow it. The scope that we
>    need includes most of the C Standard Library; POSIX additions; and some
>    necessary, system-specific extensions. This does not mean we should (or
>    can) follow the entire specification -- there will be some parts which
>    simply aren't worth implementing, and some parts which cannot be safely
>    used in modern coding practice.
>
>
I don’t think that POSIX additions should be part of the core library.  Not
all interesting targets are POSIX: e.g. Windows.  I think that POSIX should
be a separate standalone library piece as you mention that dynamic loading
should be downthread.  I think that the only pieces that should be
available in the core should be the C11 core specification.

What parts of the C standard do you consider as not being worth
implementing?

If you are looking to implement “extensions” which replace the modern
coding practices, does that mean that the surface really should be the
MSVCRT implementation then?  Because it does deprecate the “unsafe”
routines in favour of safe versions (suffixed with `_s`).  Additionally,
you could always just implement the C standard annex and use those instead.

>
>    1.
>
>    Vendor extensions must be considered very carefully, and only admitted
>    when necessary. Similar to Clang and libc++, it does seem inevitable that
>    we will need to provide some level of compatibility with other vendors'
>    extensions.
>
>
How would this work for reasonable bodies of code which are built on
Linux?  e.g. Chrome does have Linux specific paths and I would be surprised
if Chrome does not depend on any GNU behaviours.

>
>    1.
>
>    The project should be an exemplar of developing with LLVM tooling. Two
>    examples are fuzz testing from the start, and sanitizer-supported testing.
>
>
> There are also few areas which we do not intend to invest in at this point:
>
>
>    1.
>
>    Implement dynamic loading and linking support.
>
>
If this is done as a “library” layer, then so should POSIX and the C99/C11
annexes.

>
>    1.
>
>    Support for more architectures (we'll start with just x86-64 for
>    simplicity).
>
>
I think that AArch64 is pretty core these days and leaving that out is
pretty restrictive.  At this point Windows AArch64 is an interesting
target.  With Linux AArch64 and Windows AArch64 becoming more mainstream,
it seems like a poor design tradeoff to limit the target to Linux x86_64.

>
> For these areas, the community is of course free to contribute. Our hope
> is that, preserving the "as a library" design philosophy will make such
> extensions easy, and allow retaining the simplicity when these features
> aren't needed.
>
> We intend to build the new libc in a gradual manner. To begin with,  the
> new libc will be a layer sitting between the application and the system
> libc. Eventually, when the implementation is sufficiently complete, it will
> be able to replace the system libc at least for some use cases and contexts.
>

This is really tricky and finicky to implement (I have done something like
this in the past).  On ELF you can interposition symbols, but on PE/COFF
with two level namespace binding, this needs to be statically resolved.
Would the approach mean that symbols are interpositioned at compile time to
ensure that they are fully redirected?  How will you manage cross-domain
memory once a malloc implementation is included into the library?  What
happens with threading?

The general libc implementation would require that full threading is under
its control - consider cases like the IE model for TLS.  This requires the
loader to be aware of the modules and the full spacing.  Another example
where this starts to break down is with faulty - it was just a library
layer that implemented compressed memory mapped library loading because a
previous libc implementation - bionic - suffered from extensive issues
including the inability to load more than a handful of modules.  This is
far from only limitation of the bionic libc implementation, but this
doesn’t seem like the appropriate forum for discussing the previous libc
implementation attempts.

One other point of interest to this is how would the loader integration
work?  With glibc, the loader effectively embeds a copy of libc for itself,
and has to dig through the kernel handoff (AT_AUXV) to get the loader
location.  What happens with multiple object file formats?  PE/COFF does
not load the same way as ELF and may ripple through the rest of the
library.  The libc integration is needed for the resolution of symbols as
well as for TLS.

> So, what do you think about incorporating this new libc under the LLVM
> project?
>

As stated, I really feel that this is far too specialised to certain use
cases that are pertinent to Google.  I think that this needs to be
broadened to allow a general purpose libc much as libc++ is a general C++
implementation.  I think that the project has a different set of
requirements and seems like it would be extremely interesting to see how it
would develop over time.  This could really be an interesting choice for a
certain type of project but as described feels like it is best explored
outside of the umbrella of LLVM.

>
> Thank you,
>
> Siva Chandra and the rest of the Google LLVM contributors
>
> _______________________________________________
> LLVM Developers mailing list
> llvm-dev at lists.llvm.org
> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>

-- 
Saleem Abdulrasool
compnerd (at) compnerd (dot) org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20190627/18a2e107/attachment.html>