[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

Tue Jun 19 23:05:25 PDT 2012

On Tue, Jun 19, 2012 at 10:59 PM, Chandler Carruth <chandlerc at google.com>wrote:

> On Tue, Jun 19, 2012 at 10:46 PM, Kostya Serebryany <kcc at google.com>wrote:
>
>>
>>
>> On Wed, Jun 20, 2012 at 9:39 AM, Chandler Carruth <chandlerc at google.com>wrote:
>>
>>> On Tue, Jun 19, 2012 at 9:07 PM, Kostya Serebryany <kcc at google.com>wrote:
>>>
>>>> +dvyukov
>>>>
>>>> On Wed, Jun 20, 2012 at 7:12 AM, Chandler Carruth <chandlerc at google.com
>>>> > wrote:
>>>>
>>>>> Hello folks (and sorry if I've forgotten to CC anyone with particular
>>>>> interest to this discussion...):
>>>>>
>>>>> I've been thinking a lot about how best to build advanced runtime
>>>>> libraries like ASan, and scale them up. Note that this does *not* try to
>>>>> address any licensing issues. For now, I'll consider those orthogonal /
>>>>> solvable w/o technical contortions. =]
>>>>>
>>>>> My primary motivation: we really, *really* need runtime libraries to
>>>>> be able to use common, shared libraries.
>>>>>
>>>>
>>>> I am not sure you understand the problem as we do.
>>>>
>>>> In short, asan/tsan/msan/etc can not use any function which is also
>>>> called from the instrumented binary.
>>>>
>>>
>>> Well, I can't be sure, but this description certainly agrees with my
>>> understanding -- you need *every* part of the runtime to be completely
>>> separate from *every* part of the instrumented binary. I'm with you there.
>>>
>>> In particular, I think the current strategy for libc & system calls
>>> makes perfect sense, and I'm not trying to suggest changing it.
>>>
>>> I think the most similar situation is is this one:
>>>
>>> In the previous version of ThreadSanitizer we used a private copy of
>>>> STLport in a separate namespace and a custom libc (small subset).
>>>>
>>>
>>> My proposal is very similar except without the need to modify the C++
>>> standard library in use. Instead, I'm suggesting post-processing the
>>> library to ensure that the standard C++ library code in the runtime is kept
>>> complete distinct from that in the instrumented binary -- everything would
>>> in fact be *mangled* differently.
>>>
>>> The goal would be to avoid the maintenance overhead of a custom C++
>>> standard library, and instead use a normal one. My understanding is that
>>> both GCC's libstdc++ and LLVM's libc++ are significantly higher quality
>>> than STLport, and if we're doing static linking, the code bloat should be
>>> greatly reduced. We could reduce it still further by doing LTO of the
>>> runtime library, which should be very straight forward given the rest of my
>>> proposal.
>>>
>>> It would still require a very small subset of libc, likely not much more
>>> than you already have.
>>>
>>>  This worked, but had problems too (Dmitry was very angry at STLport for
>>>> code bloat, stack size increase and some direct libc calls).
>>>>
>>>
>>> I would be interested to know if the above addresses most of the
>>> problems or not.
>>>
>>>
>>>>  Until recently this was not causing too much pain in asan/tsan, but
>>>> our attempts to use the LLVM DWARF readers made it worse.
>>>> When tsan finds a race, we need to symbolize it online to be able to
>>>> match against a suppression and decide whether we want to emit the warning.
>>>> Today we do it in a separate addr2line process (ugly and slow).
>>>> But if we start calling the LLVM dwarf reader we end up with all
>>>> possible dependency problems (Dmitry and Alexey will know the exact ones)
>>>> because the LLVM code calls to malloc, memcpy, etc.
>>>>
>>>> Frankly, I don't have any solution other than to change the code such
>>>> that it does not call libc/libc++.
>>>> Some of that may be solved by a private copy of STLport + a bit of
>>>> custom libc (but see above about STLport)
>>>>
>>>
>>> I think my proposal is essentially in between these two:
>>>
>>> - Avoid the need for a low quality STL by using a normal C++ standard
>>> library implementation, and avoid maintenance burden by doing a link-time
>>> mangling of the symbols.
>>>
>>
>> re-linking might be too platform specific.
>> How about compiling the library into LLVM bitcode and adding
>> namespaces/prefixes to that bitcode?
>>
>
> Re-linking is a bit platform specific...
>
> It would definitely work on ELF platforms, and likely on Darwin, but
> Windows is tricky.
>
> On windows we would at least need a custom tool, but such a tool would be
> quite easy to write I suspect. We could even use the very LLVM libraries in
> question to write it! ;] Amusingly, I think with the LLVM libraries we
> could very easily write a custom tool just to mangle the symbol names in a
> collection of object files very easily and have it work on *most* platforms!
>
> Still, the bitcode idea is interesting. Doing this entirely in bitcode has
> some advantages as these types of runtimes are among the best uses for
> things like LTO: they're small, performance sensitive, can enumerate the
> entry points easily, and are likely to have a particular need for dead code
> elimination.
>

One reason to want to have some support for doing this w/o bitcode: we may
not have the bitcode. Specifically, the goal would be to use the "normal"
C++ standard library, provided it is available to link statically
(libstdc++ and libc++ certainly are, I don't know about MSVC). That would
be much easier if we can actually use the existing archive file, and just
"fix" the .o files inside it.

It seems likely to be the equivalent of an 'ld -r' run with a linker script
to munge the symbol names, or potentially a custom tool written with the
LLVM object file libraries.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120619/a3dc3065/attachment.html>