[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++

Evgenii Stepanov via cfe-dev cfe-dev at lists.llvm.org
Wed Aug 17 13:22:07 PDT 2016


On Wed, Aug 17, 2016 at 12:12 PM, Craig, Ben via cfe-dev
<cfe-dev at lists.llvm.org> wrote:
> On 8/16/2016 7:35 PM, Evgenii Stepanov via cfe-dev wrote:
>>
>> So, I'd argue that proper support for sanitized shared libraries
>> (primarily libc++, but not just libc++) would require loader change.
>> We could start by agreeing on and specifying a way a binary would
>> declare it's "sanitizer type" which could be used at runtime to change
>> the library lookup path.
>
> I don't think you need loader changes if you are willing to change the
> soname and use version tags.  I don't think loader changes will fix any of
> the pathological cases that soname and version tag trickery won't fix.
>
> Here's a brief proposal for a way to do this...
> * Give sanitized libraries a different file name.  For example,
> libc++-msan.so.
> * Give sanitized libraries a different soname.  For example,
> libc++-msan.so.1.
> * Put all symbols in libc++ under a version tag (LIBCPP_MSAN perhaps).  The
> "regular" build of libc++ will continue to use unversioned symbols.
> * Install the sanitized libc++ in the same directory as the regular libc++.
> * Change the clang driver so that -fsanitize=memory will cause the linker to
> pull in libc++-msan.so instead of libc++.so.  This will cause the DT_NEEDED
> to point at the msan version of libc++, and it will cause all the unresolved
> symbols to point to @LIBCPP_MSAN versions of the symbols.
>
> How this works in mixed envrionments...
> Case 1 (great!):
> * User builds an msan version of an executable (msan_tester).  If you point
> 'nm' at msan_tester, you will see that it has a lot of standard library
> symbols with @LIBCPP_MSAN on them.  'ldd' will tell you that msan_tester
> will pull in libc++-msan.so, but not libc++.so.
> * Suppose msan_tester does a dlopen and dlsym of a non-msan'd C++ library.
> That C++ library was built against regular libc++. Regular libc++ gets
> loaded, but none of it's symbols will get put in the global symbol table,
> because libc++-msan.so got there first.
> * Happy day, only one version of libc++ is getting used (though two
> different ones got loaded).
>
> Case 2(boo!):
> * User builds an regular version of an executable (normal_tester)
> * normal_tester does a dlopen and dlsym of an msan'd C++ library. The msan'd

Not a problem. This ^^^ step would fail because of the unresolved
symbols in the library that are normally resolved by the msan runtime
library statically linked into the executable.

> C++ library is still going to bind against the libc++-msan.so version of the
> symbols.
> * Sad day.  Two versions of libc++ are being used.
>
> Note that changing the loader wouldn't fix case 2 either, at least as I
> understand the proposal.

I think this approach would kinda work for libc++ in its current
state, but it's quiet cumbersome in general. For example, can it be
applied to libraries that use versioned symbols already?

Another problem is that in case 1, library constructors are run twice.
It could be OK for libc++ now, but wrong in general.

>
>
>>
>> Also, we can solve this for the case of -static-libstdc++ easily in
>> the clang driver by looking under /msan/ subdirectory first. With
>> that, we could replace the whole msan bootstrap instruction [1] with
>> just "use -static-libstdc++".
>>
>> [1]
>> https://github.com/google/sanitizers/wiki/MemorySanitizerBootstrappingClang
>>
>>
>> On Mon, Aug 15, 2016 at 2:34 PM, Jonathan Roelofs
>> <jonathan at codesourcery.com> wrote:
>>>
>>>
>>> On 8/15/16 1:51 PM, Hal Finkel wrote:
>>>>
>>>> ----- Original Message -----
>>>>>
>>>>> From: "Jonathan Roelofs" <jonathan at codesourcery.com> To: "Hal
>>>>> Finkel" <hfinkel at anl.gov> Cc: "Eric Fiselier" <eric at efcs.ca>,
>>>>> "clang developer list" <cfe-dev at lists.llvm.org>, "Chandler
>>>>> Carruth" <chandlerc at gmail.com>, "Kostya Serebryany"
>>>>> <kcc at google.com>, "Evgenii Stepanov" <eugenis at google.com> Sent:
>>>>> Monday, August 15, 2016 9:24:17 AM Subject: Re: [cfe-dev] Making
>>>>> MSAN Easier to Use: Providing a Sanitized Libc++
>>>>>
>>>>>
>>>>>
>>>>> On 8/14/16 7:31 PM, Hal Finkel wrote:
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>>
>>>>>>> From: "Jonathan Roelofs via cfe-dev" <cfe-dev at lists.llvm.org>
>>>>>>> To: "Eric Fiselier" <eric at efcs.ca>, "clang developer list"
>>>>>>> <cfe-dev at lists.llvm.org>, "Chandler Carruth"
>>>>>>> <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>,
>>>>>>> "Evgenii Stepanov" <eugenis at google.com> Sent: Sunday, August
>>>>>>> 14, 2016 7:07:00 PM Subject: Re: [cfe-dev] Making MSAN Easier
>>>>>>> to Use: Providing a Sanitized   Libc++
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
>>>>>>>>
>>>>>>>> Sanitizers such as MSAN require the entire program to be
>>>>>>>> instrumented, anything less leads to plenty of false
>>>>>>>> positives. Unfortunately this can be difficult to achieve,
>>>>>>>> especially for the C and C++ standard libraries. To work
>>>>>>>> around this the sanitizers provide interceptors for common C
>>>>>>>> functions, but the same solution doesn't work as well for the
>>>>>>>> C++ STL. Instead users are forced to manually build and link
>>>>>>>> a custom sanitized libc++. This is a huge PITA and I would
>>>>>>>> like to improve the situation, not just for MSAN but all
>>>>>>>> sanitizers. I'm working on a proposal to change this. The
>>>>>>>> basis of my proposal is:
>>>>>>>>
>>>>>>>> Clang should install/provide multiple sanitized versions of
>>>>>>>> Libc++ and a mechanism to easily link them, as if they were
>>>>>>>> a Compiler-RT runtime.
>>>>>>>>
>>>>>>>> The goal of this proposal is:
>>>>>>>>
>>>>>>>> (1) Greatly reduce the number of false positives caused by
>>>>>>>> using an un-sanitized STL. (2) Allow sanitizers to catch user
>>>>>>>> bugs that occur within the STL library, not just its
>>>>>>>> headers.
>>>>>>>>
>>>>>>>> The basic steps I would like to take to achieve this are:
>>>>>>>>
>>>>>>>> (1) Teach the compiler-rt CMake how to build and install
>>>>>>>> each sanitized libc++ version along side its other runtimes.
>>>>>>>> (2) Add options to the Clang driver to support linking/using
>>>>>>>> these libraries.
>>>>>>>>
>>>>>>>> I think this proposal is likely to be contentious, so I
>>>>>>>> would like to focus on the details it. Once I have some
>>>>>>>> feedback on these details I'll put together a formal
>>>>>>>> proposal, including a plan for implementing it. The details I
>>>>>>>> would like input on are:
>>>>>>>>
>>>>>>>> (A) What kind and how many sanitized versions of libc++
>>>>>>>> should we provide?
>>>>>>>>
>>>>>>>>
>>>>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>> I think the minimum set would be Address (which includes Leak),
>>>>>>>>
>>>>>>>> Memory (With origin tracking?), Thread, and Undefined. Once
>>>>>>>> we get into combinations of sanitizers things get more
>>>>>>>> complicated. What other sanitizer combinations should we
>>>>>>>> provide?
>>>>>>>>
>>>>>>>> (B) How should we handle UBSAN?
>>>>>>>> ---------------------------------------------------
>>>>>>>>
>>>>>>>> UBSAN is really just a collection of sanitizers and
>>>>>>>> providing sanitized versions of libc++ for every possible
>>>>>>>> configuration is out of the question. Instead we should
>>>>>>>> figure out what subset of UBSAN checks we want to enable in
>>>>>>>> sanitized libc++ versions. I suspect we want to disable the
>>>>>>>> following checks.
>>>>>>>>
>>>>>>>> * -fsanitize=vptr * -fsanitize=function *
>>>>>>>> -fsanitize=float-divide-by-zero
>>>>>>>>
>>>>>>>> Additionally UBSAN can be combined with every other
>>>>>>>> sanitizer group (ie Address, Memory, Thread). Do we want to
>>>>>>>> provide a combination of UBSAN on/off for every group, or can
>>>>>>>> we simply provide an over-sanitized version with UBSAN on?
>>>>>>>>
>>>>>>>> (C) How should the Clang driver expose the sanitized
>>>>>>>> libraries to the users?
>>>>>>>>
>>>>>>>>
>>>>>>>> -------------------------------------------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>> I would like to propose the driver option '-fsanitize-stdlib' and
>>>>>>>>
>>>>>>>> '-fsanitize-stdlib=<sanitizer>'. The first version deduces
>>>>>>>> the best sanitized version to use, the second allows it to
>>>>>>>> be explicitly specified.
>>>>>>>>
>>>>>>>> A couple of other options are:
>>>>>>>>
>>>>>>>> * -fsanitize=foo:  Implicitly turn on a sanitized STL. Clang
>>>>>>>> deduces which version. * -stdlib=libc++-<sanitizer>:
>>>>>>>> Explicitly turn on and choose a sanitized STL.
>>>>>>>>
>>>>>>>> (D) Should sanitized libc++ versions override libc++.so?
>>>>>>>>
>>>>>>>>
>>>>>>>> -------------------------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>> For example, what happens when a program links to both a sanitized
>>>>>>>>
>>>>>>>> and non-sanitized libc++ version? Does the sanitized version
>>>>>>>> replace the non-sanitized version, or should both versions
>>>>>>>> be loaded into the program?
>>>>>>>>
>>>>>>>> Essentially I'm asking if the sanitized versions of libc++
>>>>>>>> should have the "soname" libc++ so they can replace
>>>>>>>> non-sanitized version, or if they should have a different
>>>>>>>> "soname" so the linker treats them as a separate library.
>>>>>>>>
>>>>>>>> I haven't looked into the consequences of either approach in
>>>>>>>> depth, but any input is appreciated.
>>>>>>>
>>>>>>>
>>>>>>> In a sense, these are /just/ multilibs, so my inclination would
>>>>>>> be to make all the soname's the same, and just stick them in
>>>>>>> appropriately named subfolders relative to their normal
>>>>>>> location.
>>>>>>
>>>>>>
>>>>>> I'm not sure that's true; there's no property of the environment
>>>>>> that determines which library path you need. As a practical
>>>>>> matter, I can't set $PLATFORM and/or $LIB in my rpath and have
>>>>>> ld.so do the right thing in this context. Moreover, it is really
>>>>>> a property of how you compiled, so I think using an alternate
>>>>>> library name is natural.
>>>>>
>>>>>
>>>>> Multilibs solve exactly the problem of "it's a property of how you
>>>>> compiled". The thing that's subtly different here is that the
>>>>> usual thing that people do with multilibs is to provide ABI
>>>>> incompatible versions of the same library (which are made
>>>>> incompatible via compiler flags, -msoft-float, for example),
>>>>> whereas these libraries just so happen to be ABI compatible with
>>>>> their non-instrumented variants.
>>>>>
>>>>> I'm not sure I understand what you're saying about $PLATFORM and
>>>>> $LIB, but I /think/ it's a red herring: the compiler takes care of
>>>>> adding in the multilib suffixes where appropriate, so shouldn't the
>>>>> answer to "which library do I stick in the rpath?" include said
>>>>> suffix (when compiled with Eric's proposed flag)?
>>>>
>>>>
>>>> I'm not sure what color herring it is ;) -- I'm trying to understand
>>>> the system you're proposing:
>>>>
>>>> 1. User A compiles/installs Clang/LLVM/libc++ on system A in
>>>> /local/clang, and so we get a /local/clang/lib/libc++.so and a
>>>> /local/clang/lib/msan/libc++.so. User A compiles a program, foo, with
>>>> msan enabled, and foo gets an rpath of /local/clang/lib/msan. User A
>>>> also compiles another program, prod, without any sanitizers, and
>>>> those get an rpath of /local/clang/lib.
>>>>
>>>> 2. User B compiles/installs Clang/LLVM/libc++ on system B in
>>>> /soft/clang, and so we get a /soft/clang/lib/libc++.so and a
>>>> /soft/clang/lib/msan/libc++.so. User A sends User B the executables
>>>> foo and prod. Those executables have rpaths with /local/clang/...,
>>>> but those don't help User B. User B has an environment with
>>>> LD_LIBRARY_PATH=/soft/clang/lib so that the executables compiled by
>>>> User A will run.
>>>>
>>>> 3. User B has no good option, because if LD_LIBRARY_PATH is set to
>>>> /soft/clang/lib, then prod will behave as expected (i.e. not be
>>>> sanitized), but foo will not. If LD_LIBRARY_PATH is set to
>>>> /soft/clang/lib/msan, then foo will be sanitized as expected, but
>>>> prod will run slower than usual.
>>>
>>>
>>> Ahhh, I see. I was imagining this sort use case:
>>>
>>> first_guy$ cat lib.h
>>> extern void lib_func();
>>>
>>> first_guy$ cat lib.c
>>> #include "lib.h"
>>>
>>> #include <stdio.h>
>>>
>>> void lib_func() {
>>>    printf("In %s\n", MESSAGE);
>>> }
>>> first_guy$ cat bin.c
>>> #include "lib.h"
>>>
>>> int main() {
>>>    lib_func();
>>> }
>>> first_guy$ mkdir -p lib/sanitized
>>> first_guy$ clang lib.c -shared -DMESSAGE="\"sanitized\"" -o
>>> lib/sanitized/library.so
>>> first_guy$ clang lib.c -shared -DMESSAGE="\"production\"" -o
>>> lib/library.so
>>> first_guy$ clang bin.c -lrary -Wl,-rpath,$PWD/lib -L./lib/sanitized/ -o
>>> sanitized
>>> first_guy$ clang bin.c -lrary -Wl,-rpath,$PWD/lib -L./lib/ -o production
>>> first_guy$ ./sanitized
>>> In sanitized
>>> first_guy$ ./production
>>> In production
>>> first_guy$ mkdir ../other_guy
>>> first_guy$ cd ../other_guy/
>>> other_guy$ cp ../first_guy/sanitized .
>>> other_guy$ cp ../first_guy/production .
>>> other_guy$ cp -r ../first_guy/lib .
>>> other_guy$ ./sanitized
>>> In sanitized
>>> other_guy$ ./production
>>> In production
>>> other_guy$ rm lib/library.so
>>> other_guy$ ln -s ../lib/sanitized/library.so lib/library.so
>>> other_guy$ ./production
>>> In sanitized
>>> other_guy$ ./sanitized
>>> In sanitized
>>>
>>>
>>> Jon
>>>
>>>
>>>> 4. User B compiles programs to send to User A. User A then sets
>>>> LD_LIBRARY_PATH to /local/clang/lib. User A has the same problem as
>>>> User B, and moreover, if User A compiles using -W,--enable-new-dtags,
>>>> then the linker will use DT_RUNPATH (instead of, or in addition to,
>>>> DT_RPATH; effect is the same), which is the recommended default on
>>>> many systems, the rpath scheme won't even work for User A on User A's
>>>> own executables (because LD_LIBRARY_PATH overrides DT_RUNPATH).
>>>>
>>>> There are a few things, other than pure directory paths, that can
>>>> appear in, or otherwise affect, LD_LIBRARY_PATH and
>>>> DT_RPATH/DT_RUNPATH, but I don't think any of them help us here:
>>>>
>>>> 1. Pseudo variables $ORIGIN, $LIB and $PLATFORM - These are expanded
>>>> by ld.so based on properties of the current execution environment
>>>> (e.g. whether you're loading a 32-bit or 64-bit executable, the
>>>> hardware architecture).
>>>>
>>>> 2. Hardware-capability strings - There are a fixed set of hardware
>>>> capabilities, such as sse, sse2, altivec, etc. that are appended to
>>>> the directory name to form alternate search paths.
>>>>
>>>> 3. The multilib suffix. This, AFAIK, is baked into the dynamic
>>>> loader. The path to the loader itself has the multilib suffix, and
>>>> that's specified in PT_INTERP.
>>>>
>>>> Unfortunately, I don't think that any of these help us.
>>>>
>>>> -Hal
>>>>
>>>>> Jon
>>>>>
>>>>>> -Hal
>>>>>>
>>>>>>>
>>>>>>> Jon
>>>>>>>
>>>>>>>> Conclusion -----------------
>>>>>>>>
>>>>>>>> I hope my proposal and questions have made sense. Any and
>>>>>>>> all input is appreciated. Please let me know if anything
>>>>>>>> needs clarification.
>>>>>>>>
>>>>>>>> /Eric
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________ cfe-dev
>>>>>>>> mailing list cfe-dev at lists.llvm.org
>>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>>
>>>>>>> -- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor
>>>>>>> Embedded _______________________________________________
>>>>>>> cfe-dev mailing list cfe-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>
>>>>> -- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor
>>>>> Embedded
>>>>>
>>> --
>>> Jon Roelofs
>>> jonathan at codesourcery.com
>>> CodeSourcery / Mentor Embedded
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux
> Foundation Collaborative Project
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev



More information about the cfe-dev mailing list