[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

Tue Jun 19 20:12:02 PDT 2012

Hello folks (and sorry if I've forgotten to CC anyone with particular
interest to this discussion...):

I've been thinking a lot about how best to build advanced runtime libraries
like ASan, and scale them up. Note that this does *not* try to address any
licensing issues. For now, I'll consider those orthogonal / solvable w/o
technical contortions. =]

My primary motivation: we really, *really* need runtime libraries to be
able to use common, shared libraries.

This starts with libraries such as the C++ standard library -- a runtime
shouldn't need to re-implement std::vector. It includes other primitive
libraries that have had significant effort put into them in LLVM such as
the ADT and Support libraries. But, IMO, it has even more importance as we
start looking at libraries such as ELF readers, DWARF readers, symbolizers,
etc. This code should shared, and shared easily, with other LLVM projects.

However, clearly the runtime must at some point be linked against a
program, and indeed programs which may be using *the same set of
libraries*. It is crucially important that the runtime uses a separate
implementation of the libraries from the ones used by the program itself:
we will often compile the program's libraries with instrumentation and
other features which we explicitly wish to avoid in the runtime. Even
simple name clashes can cause problems, leading to the current practice of
putting all of these runtime libraries into a '__sanitizer' or other
specially spelled namespace.

A final unusual requirement is that at least *some* of the code for the
runtime libraries must be statically linked to have reasonable efficiency.
We also have several use cases where it would be very convenient to link
*all* of the runtime statically, so I prefer a solution that preserves this
option.

So how can we effectively share code? Here is my proposal, and a few
alternate strategies.

I suggest that we build the runtime library as-if it were not a runtime
library at all, and just a normal library. No strange namespaces, no
restrictions on what other libraries it uses with one exception: they must
*all* be statically linkable. We build this as a normal archive library,
nothing special. One nice property is that testing the runtime library
becomes the same as testing any other library.

Then, we have a special build step to produce a final archive which is
actually *used* as the runtime library. This step works not dissimilarly to
the step to link an executable: we build the list of archive libraries
depended on, but instead of linking an executable, we run a linker script
over them. This script will re-link each '.o' file from the transitive
closure of archives, prepending a '__asan__' (or other runtime library
prefix) onto each symbol; effectively mangling each symbol. All of these
processed '.o' files would go into a single, final archive that would be
the installed runtime library. The only functions not processed in this
manner are a white list of "exported" functions from the runtime (C-library
routines provided by the runtime, and runtime entry points, et.).

The result should be a runtime library that is essentially hermetic, and
should have no clashes with binaries it links against. It would be free to
use standard libraries, LLVM libraries, whatever it needs. That said, there
are some clear disadvantages:
- Bizarre name mangling, especially for C++
- Potentially incompatible with C++ EH, libunwind, or other tools (I just
don't know, haven't done enough research here)
- Requires "relinking" the final runtime
- Definitely implementable on Linux & ELF-based BSDs, I *think* do-able on
Darwin, but I have no idea about Windows.
- Other downsides? I'm probably missing some big problems here... ;]

However, if we can make this (possibly with tweaks/modifications) work, I
think the upside is quite large -- the runtime library stops having to be
written in such a strange special sub-set of the language, etc.

Note that this proposal is orthogonal to the issue of minimizing the binary
size and cost of the runtime library -- that is clearly still an important
concern, but that can be addressed both with or without using other
libraries. LLVM has lots of libraries specifically engineered to be
lightweight in situations like this.

Other alternatives that have been discussed:

- Require isolating all shared code into a shared library (.so) than is
loaded as-needed. This helps some, but it doesn't seem to fully solve the
issues (where does the shared code go? the .so? What happens when it is
loaded into a program that already has copies of the same code? What
happens when one is instrumented and the other isn't). It also requires us
to ship the '.so' with the binary to get full functionality, something that
would be at least somewhat undesirable. It also requires the runtime
library developers to carefully partition the code into that which can go
in the .a and that which can go in the .so.

- The current strategy of re-implementing everything needed from
(essentially) the ground up inside the runtime library. I think that this
has serious long-term maintenance problems.... but who knows, maybe?

- Other ideas?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120619/3d23298b/attachment.html>