[LLVMdev] RFC: How can AddressSanitizer, ThreadSanitizer, and similar runtime libraries leverage shared library code?

Thu Aug 16 08:51:43 PDT 2012

On Mon, Aug 13, 2012 at 7:22 PM, Alexey Samsonov <samsonov at google.com>wrote:

> (resurrecting the thread, as much is discussed here already)
>
> Formulating Kostya's suggestion:
> What do you think of compiling LLVM sources into ASan/TSan runtime by just
> taking the library sources, providing custom
> compiler (target) flags *and* a flag "-Dllvm=__sanitizer_llvm"?
> Yeah, it's hacky and applicable to LLVM libs, but OTOH we don't plan to
> use smth else for now (in the short-term).
> And it's trivial to implement and portable as well :)
>
> See http://codereview.appspot.com/6458066/ (or attached patch).
>

Update on this. There was an IRC conversation with Chandler, Kostya et al.
In particular:
* preprocessor macro that renames "llvm" namespace to smth like
"__asan_llvm" looks like a gross hack
* it doesn't help with STL and other libraries - if there is STL code in
in-process symbolizer (and it's hard and suboptimal to get rid of it, as
it's everywhere in LLVM libs), we may run into problems
  with two versions (instrumented and uninstrumented) of the same functions.

Instead we may return to the idea of external symbolizer binary, that would
be used by ASan/TSan in the following manner:
1) The tool will be a separate binary, and will have no limitations on
available functions. It can use LLVM libraries, and STL, and
should better not use or depend on sanitizer runtimes (why would it need
that?)
2) ASan/TSan can launch this tool at startup (for example, in __xsan_init)
when there is a single thread in a program by fork + execl,
and will communicate with it via pipe.
3) Tool will be capable of transforming (module, offset) pairs to (file,
function, line, column, ...) tuples.

In fact this looks similar to existing addr2line-based symbolizer in tsan,
but our solution may be better, as:
* we control the tool and can simplify its deployment (see below).
* addr2line can work for one module only, so we either have to run several
addr2line processes (as in tsan), or symbolize stack traces offline (as in
asan).
* addr2line doesn't work with -gline-tables-only (i.e. it doesn't take
fully qualified function names from symbol table if they are not contained
in
debug info).
* we are free to optimize speed/memory consumption of a tool.

Such a tool, if implemented would be rather standalone and would in fact
have little to do with compiler-rt. We may call it llvm-addr2line, or
llvm-symbolize, and
develop and build it just like any other llvm tool (llvm-nm, llvm-dwarfdump
etc). We can simplify ASan/TSan deployment by
storing the tool binary in .data section of ASan static runtime, mapping
and executing it at startup, so that we won't have to drag symbolizer binary
every time we want to run ASan.

I have a very raw prototype of such a binary, that uses LLVM's libObject,
libSupport, libDebugInfo, and is integrated into ASan, which fork+execs it
when it needs to symbolize a stack trace. It successfully symbolizes ASan
report for model Chromium crash. If Chromium is build with
-gline-tables-only (500Mb),
it runs in less than a second (most time is spent on loading the binary in
memory), (unsurprisingly) uses about 520Mb of memory and provides stack
trace
with file/line/column and full function names.

Do you think it is the way to go for us now?

> On Thu, Jun 21, 2012 at 1:44 PM, Chandler Carruth <chandlerc at google.com>wrote:
>
>> On Thu, Jun 21, 2012 at 2:39 AM, Alexey Samsonov <samsonov at google.com>wrote:
>>
>>>
>>>
>>> On Thu, Jun 21, 2012 at 1:34 PM, Dmitry Vyukov <dvyukov at google.com>wrote:
>>>
>>>> On Thu, Jun 21, 2012 at 1:30 PM, Chandler Carruth <chandlerc at google.com
>>>> > wrote:
>>>>
>>>>>  Can we alter the build system so that when building a run-time
>>>>>>>>> library it modifies all .cpp files like this:
>>>>>>>>>    namespace FOO {
>>>>>>>>>    <file body>
>>>>>>>>>    }
>>>>>>>>> This will give us essentially the same thing, but w/o system
>>>>>>>>> dependent object file hackery.
>>>>>>>>> Maybe we can add a Clang flag to add such a namespace for us?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I think this is essentially what Dmitry was talking about w/ past
>>>>>>>> STLport experience. It has lots of limitations:
>>>>>>>>
>>>>>>>
>>>>>>> Patching object files still sounds much scarier and harder to port.
>>>>>>> I'd prefer to find a solution that involves only source files and
>>>>>>> maybe clang.
>>>>>>> Pondering...
>>>>>>>
>>>>>>>
>>>>>>>> - You can't use the normal system standard library
>>>>>>>>
>>>>>>> - You have to build the standard library from source
>>>>>>>> - You can't wrap certain parts of it (operator new, delete, a few
>>>>>>>> other things)
>>>>>>>> - You can't re-use any C libraries (zlib for example)
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>> Perhaps you are solving a broader problem. But as for asan/tsan, we
>>>>>> currently need only symbolizer, it's separable from everything else, and
>>>>>> can be made to not use STL.
>>>>>>
>>>>>
>>>>> If you want to share LLVM code for the object and dwarf reading, I do
>>>>> not believe this to be true at all.
>>>>>
>>>>
>>>> I've already removed code for the object reading for exactly that
>>>> reason, so now it's just dwarf parsing :) There are some CTL containers
>>>> involved, but I think they can be replaced.
>>>>
>>>
>>> Agree here. I hope to modify/extend this code soon anyway.
>>>
>>
>> Folks, this is not the path to sharing code. This is the path to forking
>> code.
>>
>> Let's go back to the very premise: I think it is highly desirable to be
>> capable of building runtimes such as ASan and TSan and *share* code rather
>> than forking it.
>>
>> I have reasons: I have seen the creation of at least three separate ELF
>> and/or DWARF parsing libraries thus far. I have seen a long series of bugs
>> found and fixed in them over the course of years, often the same bug, often
>> with great expense in debugging to understand why. I don't want us to keep
>> paying this cost. I don't think these pieces of code are likely to be alone
>> in this.
>>
>>
>> Now, perhaps I am wrong, and it is not worth it. Thus far, I don't hear
>> any convincing arguments to that effect, but I'm very willing to believe
>> I'm wrong as I don't work on one of these runtimes, and so don't have a
>> direct appreciation for all of the costs involved.
>>
>> But let's be extremely clear on what you are suggesting: you are
>> specifically doing away with the very idea of sharing code with the rest of
>> the LLVM project, and instead deciding to fork and write custom code in the
>> runtime for all functionality.
>>
>
>
>
> --
> Alexey Samsonov, MSK
>
>

-- 
Alexey Samsonov, MSK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20120816/e00f1bc2/attachment.html>