[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++
Jonathan Roelofs via cfe-dev
cfe-dev at lists.llvm.org
Mon Aug 15 14:34:18 PDT 2016
On 8/15/16 1:51 PM, Hal Finkel wrote:
> ----- Original Message -----
>> From: "Jonathan Roelofs" <jonathan at codesourcery.com> To: "Hal
>> Finkel" <hfinkel at anl.gov> Cc: "Eric Fiselier" <eric at efcs.ca>,
>> "clang developer list" <cfe-dev at lists.llvm.org>, "Chandler
>> Carruth" <chandlerc at gmail.com>, "Kostya Serebryany"
>> <kcc at google.com>, "Evgenii Stepanov" <eugenis at google.com> Sent:
>> Monday, August 15, 2016 9:24:17 AM Subject: Re: [cfe-dev] Making
>> MSAN Easier to Use: Providing a Sanitized Libc++
>>
>>
>>
>> On 8/14/16 7:31 PM, Hal Finkel wrote:
>>> ----- Original Message -----
>>>> From: "Jonathan Roelofs via cfe-dev" <cfe-dev at lists.llvm.org>
>>>> To: "Eric Fiselier" <eric at efcs.ca>, "clang developer list"
>>>> <cfe-dev at lists.llvm.org>, "Chandler Carruth"
>>>> <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>,
>>>> "Evgenii Stepanov" <eugenis at google.com> Sent: Sunday, August
>>>> 14, 2016 7:07:00 PM Subject: Re: [cfe-dev] Making MSAN Easier
>>>> to Use: Providing a Sanitized Libc++
>>>>
>>>>
>>>>
>>>> On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
>>>>> Sanitizers such as MSAN require the entire program to be
>>>>> instrumented, anything less leads to plenty of false
>>>>> positives. Unfortunately this can be difficult to achieve,
>>>>> especially for the C and C++ standard libraries. To work
>>>>> around this the sanitizers provide interceptors for common C
>>>>> functions, but the same solution doesn't work as well for the
>>>>> C++ STL. Instead users are forced to manually build and link
>>>>> a custom sanitized libc++. This is a huge PITA and I would
>>>>> like to improve the situation, not just for MSAN but all
>>>>> sanitizers. I'm working on a proposal to change this. The
>>>>> basis of my proposal is:
>>>>>
>>>>> Clang should install/provide multiple sanitized versions of
>>>>> Libc++ and a mechanism to easily link them, as if they were
>>>>> a Compiler-RT runtime.
>>>>>
>>>>> The goal of this proposal is:
>>>>>
>>>>> (1) Greatly reduce the number of false positives caused by
>>>>> using an un-sanitized STL. (2) Allow sanitizers to catch user
>>>>> bugs that occur within the STL library, not just its
>>>>> headers.
>>>>>
>>>>> The basic steps I would like to take to achieve this are:
>>>>>
>>>>> (1) Teach the compiler-rt CMake how to build and install
>>>>> each sanitized libc++ version along side its other runtimes.
>>>>> (2) Add options to the Clang driver to support linking/using
>>>>> these libraries.
>>>>>
>>>>> I think this proposal is likely to be contentious, so I
>>>>> would like to focus on the details it. Once I have some
>>>>> feedback on these details I'll put together a formal
>>>>> proposal, including a plan for implementing it. The details I
>>>>> would like input on are:
>>>>>
>>>>> (A) What kind and how many sanitized versions of libc++
>>>>> should we provide?
>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>
>>>>>
I think the minimum set would be Address (which includes Leak),
>>>>> Memory (With origin tracking?), Thread, and Undefined. Once
>>>>> we get into combinations of sanitizers things get more
>>>>> complicated. What other sanitizer combinations should we
>>>>> provide?
>>>>>
>>>>> (B) How should we handle UBSAN?
>>>>> ---------------------------------------------------
>>>>>
>>>>> UBSAN is really just a collection of sanitizers and
>>>>> providing sanitized versions of libc++ for every possible
>>>>> configuration is out of the question. Instead we should
>>>>> figure out what subset of UBSAN checks we want to enable in
>>>>> sanitized libc++ versions. I suspect we want to disable the
>>>>> following checks.
>>>>>
>>>>> * -fsanitize=vptr * -fsanitize=function *
>>>>> -fsanitize=float-divide-by-zero
>>>>>
>>>>> Additionally UBSAN can be combined with every other
>>>>> sanitizer group (ie Address, Memory, Thread). Do we want to
>>>>> provide a combination of UBSAN on/off for every group, or can
>>>>> we simply provide an over-sanitized version with UBSAN on?
>>>>>
>>>>> (C) How should the Clang driver expose the sanitized
>>>>> libraries to the users?
>>>>> -------------------------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>
>>>>>
I would like to propose the driver option '-fsanitize-stdlib' and
>>>>> '-fsanitize-stdlib=<sanitizer>'. The first version deduces
>>>>> the best sanitized version to use, the second allows it to
>>>>> be explicitly specified.
>>>>>
>>>>> A couple of other options are:
>>>>>
>>>>> * -fsanitize=foo: Implicitly turn on a sanitized STL. Clang
>>>>> deduces which version. * -stdlib=libc++-<sanitizer>:
>>>>> Explicitly turn on and choose a sanitized STL.
>>>>>
>>>>> (D) Should sanitized libc++ versions override libc++.so?
>>>>> -------------------------------------------------------------------------------------------
>>>>>
>>>>>
>>>>>
>>
>>>>>
For example, what happens when a program links to both a sanitized
>>>>> and non-sanitized libc++ version? Does the sanitized version
>>>>> replace the non-sanitized version, or should both versions
>>>>> be loaded into the program?
>>>>>
>>>>> Essentially I'm asking if the sanitized versions of libc++
>>>>> should have the "soname" libc++ so they can replace
>>>>> non-sanitized version, or if they should have a different
>>>>> "soname" so the linker treats them as a separate library.
>>>>>
>>>>> I haven't looked into the consequences of either approach in
>>>>> depth, but any input is appreciated.
>>>>
>>>> In a sense, these are /just/ multilibs, so my inclination would
>>>> be to make all the soname's the same, and just stick them in
>>>> appropriately named subfolders relative to their normal
>>>> location.
>>>
>>> I'm not sure that's true; there's no property of the environment
>>> that determines which library path you need. As a practical
>>> matter, I can't set $PLATFORM and/or $LIB in my rpath and have
>>> ld.so do the right thing in this context. Moreover, it is really
>>> a property of how you compiled, so I think using an alternate
>>> library name is natural.
>>
>> Multilibs solve exactly the problem of "it's a property of how you
>> compiled". The thing that's subtly different here is that the
>> usual thing that people do with multilibs is to provide ABI
>> incompatible versions of the same library (which are made
>> incompatible via compiler flags, -msoft-float, for example),
>> whereas these libraries just so happen to be ABI compatible with
>> their non-instrumented variants.
>>
>> I'm not sure I understand what you're saying about $PLATFORM and
>> $LIB, but I /think/ it's a red herring: the compiler takes care of
>> adding in the multilib suffixes where appropriate, so shouldn't the
>> answer to "which library do I stick in the rpath?" include said
>> suffix (when compiled with Eric's proposed flag)?
>
> I'm not sure what color herring it is ;) -- I'm trying to understand
> the system you're proposing:
>
> 1. User A compiles/installs Clang/LLVM/libc++ on system A in
> /local/clang, and so we get a /local/clang/lib/libc++.so and a
> /local/clang/lib/msan/libc++.so. User A compiles a program, foo, with
> msan enabled, and foo gets an rpath of /local/clang/lib/msan. User A
> also compiles another program, prod, without any sanitizers, and
> those get an rpath of /local/clang/lib.
>
> 2. User B compiles/installs Clang/LLVM/libc++ on system B in
> /soft/clang, and so we get a /soft/clang/lib/libc++.so and a
> /soft/clang/lib/msan/libc++.so. User A sends User B the executables
> foo and prod. Those executables have rpaths with /local/clang/...,
> but those don't help User B. User B has an environment with
> LD_LIBRARY_PATH=/soft/clang/lib so that the executables compiled by
> User A will run.
>
> 3. User B has no good option, because if LD_LIBRARY_PATH is set to
> /soft/clang/lib, then prod will behave as expected (i.e. not be
> sanitized), but foo will not. If LD_LIBRARY_PATH is set to
> /soft/clang/lib/msan, then foo will be sanitized as expected, but
> prod will run slower than usual.
Ahhh, I see. I was imagining this sort use case:
first_guy$ cat lib.h
extern void lib_func();
first_guy$ cat lib.c
#include "lib.h"
#include <stdio.h>
void lib_func() {
printf("In %s\n", MESSAGE);
}
first_guy$ cat bin.c
#include "lib.h"
int main() {
lib_func();
}
first_guy$ mkdir -p lib/sanitized
first_guy$ clang lib.c -shared -DMESSAGE="\"sanitized\"" -o
lib/sanitized/library.so
first_guy$ clang lib.c -shared -DMESSAGE="\"production\"" -o lib/library.so
first_guy$ clang bin.c -lrary -Wl,-rpath,$PWD/lib -L./lib/sanitized/ -o
sanitized
first_guy$ clang bin.c -lrary -Wl,-rpath,$PWD/lib -L./lib/ -o production
first_guy$ ./sanitized
In sanitized
first_guy$ ./production
In production
first_guy$ mkdir ../other_guy
first_guy$ cd ../other_guy/
other_guy$ cp ../first_guy/sanitized .
other_guy$ cp ../first_guy/production .
other_guy$ cp -r ../first_guy/lib .
other_guy$ ./sanitized
In sanitized
other_guy$ ./production
In production
other_guy$ rm lib/library.so
other_guy$ ln -s ../lib/sanitized/library.so lib/library.so
other_guy$ ./production
In sanitized
other_guy$ ./sanitized
In sanitized
Jon
>
> 4. User B compiles programs to send to User A. User A then sets
> LD_LIBRARY_PATH to /local/clang/lib. User A has the same problem as
> User B, and moreover, if User A compiles using -W,--enable-new-dtags,
> then the linker will use DT_RUNPATH (instead of, or in addition to,
> DT_RPATH; effect is the same), which is the recommended default on
> many systems, the rpath scheme won't even work for User A on User A's
> own executables (because LD_LIBRARY_PATH overrides DT_RUNPATH).
>
> There are a few things, other than pure directory paths, that can
> appear in, or otherwise affect, LD_LIBRARY_PATH and
> DT_RPATH/DT_RUNPATH, but I don't think any of them help us here:
>
> 1. Pseudo variables $ORIGIN, $LIB and $PLATFORM - These are expanded
> by ld.so based on properties of the current execution environment
> (e.g. whether you're loading a 32-bit or 64-bit executable, the
> hardware architecture).
>
> 2. Hardware-capability strings - There are a fixed set of hardware
> capabilities, such as sse, sse2, altivec, etc. that are appended to
> the directory name to form alternate search paths.
>
> 3. The multilib suffix. This, AFAIK, is baked into the dynamic
> loader. The path to the loader itself has the multilib suffix, and
> that's specified in PT_INTERP.
>
> Unfortunately, I don't think that any of these help us.
>
> -Hal
>
>>
>> Jon
>>
>>>
>>> -Hal
>>>
>>>>
>>>>
>>>> Jon
>>>>
>>>>>
>>>>> Conclusion -----------------
>>>>>
>>>>> I hope my proposal and questions have made sense. Any and
>>>>> all input is appreciated. Please let me know if anything
>>>>> needs clarification.
>>>>>
>>>>> /Eric
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________ cfe-dev
>>>>> mailing list cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>
>>>> -- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor
>>>> Embedded _______________________________________________
>>>> cfe-dev mailing list cfe-dev at lists.llvm.org
>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>
>>>
>>
>> -- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor
>> Embedded
>>
>
--
Jon Roelofs
jonathan at codesourcery.com
CodeSourcery / Mentor Embedded
More information about the cfe-dev
mailing list