[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++

Evgenii Stepanov via cfe-dev cfe-dev at lists.llvm.org
Tue Aug 16 17:35:00 PDT 2016


So, I'd argue that proper support for sanitized shared libraries
(primarily libc++, but not just libc++) would require loader change.
We could start by agreeing on and specifying a way a binary would
declare it's "sanitizer type" which could be used at runtime to change
the library lookup path.

Also, we can solve this for the case of -static-libstdc++ easily in
the clang driver by looking under /msan/ subdirectory first. With
that, we could replace the whole msan bootstrap instruction [1] with
just "use -static-libstdc++".

[1] https://github.com/google/sanitizers/wiki/MemorySanitizerBootstrappingClang


On Mon, Aug 15, 2016 at 2:34 PM, Jonathan Roelofs
<jonathan at codesourcery.com> wrote:
>
>
> On 8/15/16 1:51 PM, Hal Finkel wrote:
>>
>> ----- Original Message -----
>>>
>>> From: "Jonathan Roelofs" <jonathan at codesourcery.com> To: "Hal
>>> Finkel" <hfinkel at anl.gov> Cc: "Eric Fiselier" <eric at efcs.ca>,
>>> "clang developer list" <cfe-dev at lists.llvm.org>, "Chandler
>>> Carruth" <chandlerc at gmail.com>, "Kostya Serebryany"
>>> <kcc at google.com>, "Evgenii Stepanov" <eugenis at google.com> Sent:
>>> Monday, August 15, 2016 9:24:17 AM Subject: Re: [cfe-dev] Making
>>> MSAN Easier to Use: Providing a Sanitized Libc++
>>>
>>>
>>>
>>> On 8/14/16 7:31 PM, Hal Finkel wrote:
>>>>
>>>> ----- Original Message -----
>>>>>
>>>>> From: "Jonathan Roelofs via cfe-dev" <cfe-dev at lists.llvm.org>
>>>>> To: "Eric Fiselier" <eric at efcs.ca>, "clang developer list"
>>>>> <cfe-dev at lists.llvm.org>, "Chandler Carruth"
>>>>> <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>,
>>>>> "Evgenii Stepanov" <eugenis at google.com> Sent: Sunday, August
>>>>> 14, 2016 7:07:00 PM Subject: Re: [cfe-dev] Making MSAN Easier
>>>>> to Use: Providing a Sanitized   Libc++
>>>>>
>>>>>
>>>>>
>>>>> On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
>>>>>>
>>>>>> Sanitizers such as MSAN require the entire program to be
>>>>>> instrumented, anything less leads to plenty of false
>>>>>> positives. Unfortunately this can be difficult to achieve,
>>>>>> especially for the C and C++ standard libraries. To work
>>>>>> around this the sanitizers provide interceptors for common C
>>>>>> functions, but the same solution doesn't work as well for the
>>>>>> C++ STL. Instead users are forced to manually build and link
>>>>>> a custom sanitized libc++. This is a huge PITA and I would
>>>>>> like to improve the situation, not just for MSAN but all
>>>>>> sanitizers. I'm working on a proposal to change this. The
>>>>>> basis of my proposal is:
>>>>>>
>>>>>> Clang should install/provide multiple sanitized versions of
>>>>>> Libc++ and a mechanism to easily link them, as if they were
>>>>>> a Compiler-RT runtime.
>>>>>>
>>>>>> The goal of this proposal is:
>>>>>>
>>>>>> (1) Greatly reduce the number of false positives caused by
>>>>>> using an un-sanitized STL. (2) Allow sanitizers to catch user
>>>>>> bugs that occur within the STL library, not just its
>>>>>> headers.
>>>>>>
>>>>>> The basic steps I would like to take to achieve this are:
>>>>>>
>>>>>> (1) Teach the compiler-rt CMake how to build and install
>>>>>> each sanitized libc++ version along side its other runtimes.
>>>>>> (2) Add options to the Clang driver to support linking/using
>>>>>> these libraries.
>>>>>>
>>>>>> I think this proposal is likely to be contentious, so I
>>>>>> would like to focus on the details it. Once I have some
>>>>>> feedback on these details I'll put together a formal
>>>>>> proposal, including a plan for implementing it. The details I
>>>>>> would like input on are:
>>>>>>
>>>>>> (A) What kind and how many sanitized versions of libc++
>>>>>> should we provide?
>>>>>>
>>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>
>>>>>>
> I think the minimum set would be Address (which includes Leak),
>>>>>>
>>>>>> Memory (With origin tracking?), Thread, and Undefined. Once
>>>>>> we get into combinations of sanitizers things get more
>>>>>> complicated. What other sanitizer combinations should we
>>>>>> provide?
>>>>>>
>>>>>> (B) How should we handle UBSAN?
>>>>>> ---------------------------------------------------
>>>>>>
>>>>>> UBSAN is really just a collection of sanitizers and
>>>>>> providing sanitized versions of libc++ for every possible
>>>>>> configuration is out of the question. Instead we should
>>>>>> figure out what subset of UBSAN checks we want to enable in
>>>>>> sanitized libc++ versions. I suspect we want to disable the
>>>>>> following checks.
>>>>>>
>>>>>> * -fsanitize=vptr * -fsanitize=function *
>>>>>> -fsanitize=float-divide-by-zero
>>>>>>
>>>>>> Additionally UBSAN can be combined with every other
>>>>>> sanitizer group (ie Address, Memory, Thread). Do we want to
>>>>>> provide a combination of UBSAN on/off for every group, or can
>>>>>> we simply provide an over-sanitized version with UBSAN on?
>>>>>>
>>>>>> (C) How should the Clang driver expose the sanitized
>>>>>> libraries to the users?
>>>>>>
>>>>>> -------------------------------------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>
>>>>>>
> I would like to propose the driver option '-fsanitize-stdlib' and
>>>>>>
>>>>>> '-fsanitize-stdlib=<sanitizer>'. The first version deduces
>>>>>> the best sanitized version to use, the second allows it to
>>>>>> be explicitly specified.
>>>>>>
>>>>>> A couple of other options are:
>>>>>>
>>>>>> * -fsanitize=foo:  Implicitly turn on a sanitized STL. Clang
>>>>>> deduces which version. * -stdlib=libc++-<sanitizer>:
>>>>>> Explicitly turn on and choose a sanitized STL.
>>>>>>
>>>>>> (D) Should sanitized libc++ versions override libc++.so?
>>>>>>
>>>>>> -------------------------------------------------------------------------------------------
>>>>>>
>>>>>>
>>>>>>
>>>
>>>>>>
> For example, what happens when a program links to both a sanitized
>>>>>>
>>>>>> and non-sanitized libc++ version? Does the sanitized version
>>>>>> replace the non-sanitized version, or should both versions
>>>>>> be loaded into the program?
>>>>>>
>>>>>> Essentially I'm asking if the sanitized versions of libc++
>>>>>> should have the "soname" libc++ so they can replace
>>>>>> non-sanitized version, or if they should have a different
>>>>>> "soname" so the linker treats them as a separate library.
>>>>>>
>>>>>> I haven't looked into the consequences of either approach in
>>>>>> depth, but any input is appreciated.
>>>>>
>>>>>
>>>>> In a sense, these are /just/ multilibs, so my inclination would
>>>>> be to make all the soname's the same, and just stick them in
>>>>> appropriately named subfolders relative to their normal
>>>>> location.
>>>>
>>>>
>>>> I'm not sure that's true; there's no property of the environment
>>>> that determines which library path you need. As a practical
>>>> matter, I can't set $PLATFORM and/or $LIB in my rpath and have
>>>> ld.so do the right thing in this context. Moreover, it is really
>>>> a property of how you compiled, so I think using an alternate
>>>> library name is natural.
>>>
>>>
>>> Multilibs solve exactly the problem of "it's a property of how you
>>> compiled". The thing that's subtly different here is that the
>>> usual thing that people do with multilibs is to provide ABI
>>> incompatible versions of the same library (which are made
>>> incompatible via compiler flags, -msoft-float, for example),
>>> whereas these libraries just so happen to be ABI compatible with
>>> their non-instrumented variants.
>>>
>>> I'm not sure I understand what you're saying about $PLATFORM and
>>> $LIB, but I /think/ it's a red herring: the compiler takes care of
>>> adding in the multilib suffixes where appropriate, so shouldn't the
>>> answer to "which library do I stick in the rpath?" include said
>>> suffix (when compiled with Eric's proposed flag)?
>>
>>
>> I'm not sure what color herring it is ;) -- I'm trying to understand
>> the system you're proposing:
>>
>> 1. User A compiles/installs Clang/LLVM/libc++ on system A in
>> /local/clang, and so we get a /local/clang/lib/libc++.so and a
>> /local/clang/lib/msan/libc++.so. User A compiles a program, foo, with
>> msan enabled, and foo gets an rpath of /local/clang/lib/msan. User A
>> also compiles another program, prod, without any sanitizers, and
>> those get an rpath of /local/clang/lib.
>>
>> 2. User B compiles/installs Clang/LLVM/libc++ on system B in
>> /soft/clang, and so we get a /soft/clang/lib/libc++.so and a
>> /soft/clang/lib/msan/libc++.so. User A sends User B the executables
>> foo and prod. Those executables have rpaths with /local/clang/...,
>> but those don't help User B. User B has an environment with
>> LD_LIBRARY_PATH=/soft/clang/lib so that the executables compiled by
>> User A will run.
>>
>> 3. User B has no good option, because if LD_LIBRARY_PATH is set to
>> /soft/clang/lib, then prod will behave as expected (i.e. not be
>> sanitized), but foo will not. If LD_LIBRARY_PATH is set to
>> /soft/clang/lib/msan, then foo will be sanitized as expected, but
>> prod will run slower than usual.
>
>
> Ahhh, I see. I was imagining this sort use case:
>
> first_guy$ cat lib.h
> extern void lib_func();
>
> first_guy$ cat lib.c
> #include "lib.h"
>
> #include <stdio.h>
>
> void lib_func() {
>   printf("In %s\n", MESSAGE);
> }
> first_guy$ cat bin.c
> #include "lib.h"
>
> int main() {
>   lib_func();
> }
> first_guy$ mkdir -p lib/sanitized
> first_guy$ clang lib.c -shared -DMESSAGE="\"sanitized\"" -o
> lib/sanitized/library.so
> first_guy$ clang lib.c -shared -DMESSAGE="\"production\"" -o lib/library.so
> first_guy$ clang bin.c -lrary -Wl,-rpath,$PWD/lib -L./lib/sanitized/ -o
> sanitized
> first_guy$ clang bin.c -lrary -Wl,-rpath,$PWD/lib -L./lib/ -o production
> first_guy$ ./sanitized
> In sanitized
> first_guy$ ./production
> In production
> first_guy$ mkdir ../other_guy
> first_guy$ cd ../other_guy/
> other_guy$ cp ../first_guy/sanitized .
> other_guy$ cp ../first_guy/production .
> other_guy$ cp -r ../first_guy/lib .
> other_guy$ ./sanitized
> In sanitized
> other_guy$ ./production
> In production
> other_guy$ rm lib/library.so
> other_guy$ ln -s ../lib/sanitized/library.so lib/library.so
> other_guy$ ./production
> In sanitized
> other_guy$ ./sanitized
> In sanitized
>
>
> Jon
>
>
>>
>> 4. User B compiles programs to send to User A. User A then sets
>> LD_LIBRARY_PATH to /local/clang/lib. User A has the same problem as
>> User B, and moreover, if User A compiles using -W,--enable-new-dtags,
>> then the linker will use DT_RUNPATH (instead of, or in addition to,
>> DT_RPATH; effect is the same), which is the recommended default on
>> many systems, the rpath scheme won't even work for User A on User A's
>> own executables (because LD_LIBRARY_PATH overrides DT_RUNPATH).
>>
>> There are a few things, other than pure directory paths, that can
>> appear in, or otherwise affect, LD_LIBRARY_PATH and
>> DT_RPATH/DT_RUNPATH, but I don't think any of them help us here:
>>
>> 1. Pseudo variables $ORIGIN, $LIB and $PLATFORM - These are expanded
>> by ld.so based on properties of the current execution environment
>> (e.g. whether you're loading a 32-bit or 64-bit executable, the
>> hardware architecture).
>>
>> 2. Hardware-capability strings - There are a fixed set of hardware
>> capabilities, such as sse, sse2, altivec, etc. that are appended to
>> the directory name to form alternate search paths.
>>
>> 3. The multilib suffix. This, AFAIK, is baked into the dynamic
>> loader. The path to the loader itself has the multilib suffix, and
>> that's specified in PT_INTERP.
>>
>> Unfortunately, I don't think that any of these help us.
>>
>> -Hal
>>
>>>
>>> Jon
>>>
>>>>
>>>> -Hal
>>>>
>>>>>
>>>>>
>>>>> Jon
>>>>>
>>>>>>
>>>>>> Conclusion -----------------
>>>>>>
>>>>>> I hope my proposal and questions have made sense. Any and
>>>>>> all input is appreciated. Please let me know if anything
>>>>>> needs clarification.
>>>>>>
>>>>>> /Eric
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________ cfe-dev
>>>>>> mailing list cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>
>>>>> -- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor
>>>>> Embedded _______________________________________________
>>>>> cfe-dev mailing list cfe-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>
>>>>
>>>
>>> -- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor
>>> Embedded
>>>
>>
>
> --
> Jon Roelofs
> jonathan at codesourcery.com
> CodeSourcery / Mentor Embedded



More information about the cfe-dev mailing list