[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

Thu Jun 21 01:42:58 PDT 2012

Can we alter the build system so that when building a run-time library it
modifies all .cpp files like this:
   namespace FOO {
   <file body>
   }
This will give us essentially the same thing, but w/o system dependent
object file hackery.
Maybe we can add a Clang flag to add such a namespace for us?
(This approach, as well as Chandler's original approach will have to deal
with malloc, memset, strlen, etc which still need to reside in the global
namespace)

--kcc

On Thu, Jun 21, 2012 at 12:10 PM, Chandler Carruth <chandlerc at google.com>wrote:

> On Thu, Jun 21, 2012 at 1:04 AM, Dmitry Vyukov <dvyukov at google.com> wrote:
>
>> On Thu, Jun 21, 2012 at 11:52 AM, Chandler Carruth <chandlerc at google.com>wrote:
>>>
>>>  Hi,
>>>>
>>>> Yes, stlport was a pain to deploy and maintain + it calls normal
>>>> operator new/delete (there is no way to put them into a separate namespace).
>>>>
>>>
>>> Ok, but putting the raw symbols into a "namespace" with the linker
>>> shouldn't be subject to these limitations.
>>>
>>
>> OK
>>
>>
>>>
>>>  Note that in some codebases we build asan/tsan runtimes from source.
>>>> How the build process will look with that object file mangling? How easy it
>>>> is to integrate it into a custom build process?
>>>>
>>>
>>> Well, I don't know yet. ;] It was an idea, I don't have an
>>> implementation at this point. That said, I had only really imagined
>>> building the runtimes from source? Maybe I don't understand what you mean
>>> by this?
>>>
>>> The vague strategy I am imagining for the build proces is this:
>>>
>>> 1) compile runtime into a static library, just like any other static
>>> library
>>>
>>> 2) collect all the '.o' files in the static archive, and in any
>>> dependencies' static archive libraries
>>>
>>> 3) for each 'foo.o' build a 'foo_munged.o' using $tool, the _munged
>>> version has all symbols not on the whitelist for export to the instrumented
>>> binary
>>>
>>> 4) put all of the _munged '.o' files into a single runtime archive
>>>
>>>
>>> The $tool here could be "ld -r" with a linker script, or (likely
>>> necessary on windows) a very simple, dedicated tool built around the LLVM
>>> object libraries to copy each symbol, munging the name.
>>>
>>>
>>> Soon I will start integrating tsan into Go language. For the Go language
>>>> we need very simple object files.
>>>>
>>>
>>> Ok... I'm not sure whether this should really constrain the way we build
>>> the core runtime system here though. If you need some logic on the tsan
>>> side factored out into a separate library for use with Go, that would seem
>>> simpler than trying to make one sanitizer runtime library to support
>>> frontends, middle ends, and programming languages with totally separate
>>> models.
>>>
>>
>> Yes, it will be a separate runtime library. But if tsan sources are
>> deeply dependent on llvm sources, this may be significantly harder to do.
>>
>
> I think we should cross this bridge when we get there.
>
> When we do, I suspect it will be reasonable, in a worst case situation, to
> abstract the business logic into an isolated shared component. My hope is
> that we won't even need to...
>
>
>>
>>
>>  No global ctors, no thread-local storage, no weak symbols and other
>>>> trickery. Basically what a portable C compiler could have produced.
>>>>
>>>
>>> These also don't seem insurmountable, even in the existing use cases.
>>> But maybe I'm not considering the actual restrictions you are, or I've
>>> misunderstood. Here is how I'm breaking down the things you've mentioned:
>>>
>>
>>
>>>
>>> 1) It seems reasonable to avoid global constructors, and do-able in C++
>>> even when using the standard library and parts of LLVM. LLVM itself
>>> specifically works to avoid them.
>>>
>>
>> Is it the case for C++ library that llvm uses?
>>
>
> LLVM is extremely resistent to growing external dependencies specifically
> because it cannot control them. In particular the parts that a runtime is
> likely to use are very unlikely to grow any problematic dependencies here.
> Essentially, it is reasonable to assert that we have control over all of
> LLVM's dependencies and can arrange for them to be very conservative here.
>
>
>>
>> 2) TLS doesn't seem to be required by anything I'm suggesting... is there
>>> something that worries you about this?
>>>
>>
>>  I suspect that C/C++ library can use them.
>>
>
> I would be very surprised if these parts of LLVM use them. If they did, I
> think it would be reasonable to make it optional and disable it in some
> circumstances.
>
>
>>
>> 3) I don't understand the requirement to have no weak symbols. Even a
>>> portable C compiler might produce weak symbols?
>>>
>>
>> The linker does not understand them.
>>
>>
>>> Still, during the re-linking phase above, it should be possible to
>>> resolve any weak symbols?
>>>
>>
>> Well, most likely yes.
>>
>> There may be additional limitations that I don't know yet.
>>
>
> Sure, time will tell. That said, I don't think future work to support Go
> should be the top priority in getting this system well integrated, and I
> don't think there are any huge road blocks already clear at this stage
> related to Go.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120621/be5491e1/attachment.html>