[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++

Evgenii Stepanov via cfe-dev cfe-dev at lists.llvm.org
Sun Aug 14 22:46:39 PDT 2016


Eric,

thanks for bringing this up! This is indeed one of the biggest issues
for sanitizer adoption right now.

I think that the same-soname approach is correct, mainly because the
sanitized library is just a version of the same library. Loading both
versions in one process would usually be an error.

RPATH does not work because if only affects immediate dependencies.
The following would refer to two different versions of libc++:
Executable (with asan) -> library A (without asan) -> libc++
        |
         -> libc++
I think even in this case, if the two libc++'s have the same soname,
only one will be loaded. Linux does breadth-first search, so it should
end up with the direct dependency of the main executable, which is
good.

Another problem is what happens when the program is installed/copied
somewhere, and the toolchain build directory is gone. We would need
help from the dynamic loader.

We have something like this set up on Android for ASan, see
https://source.android.com/devices/tech/debug/asan.html#sanitize_target
The dynamic loader adds directories to the default library search path
when it loads an instrumented executable. The directory with the ASan
libraries is added at the start of the list. I think this is similar
to how multilib works.

On Android we use the linker name itself (PT_INTERP field) to identify
ASan executables. It would probably be better to use a .note section
or even something else.


On Sun, Aug 14, 2016 at 7:14 PM, Eric Fiselier <eric at efcs.ca> wrote:
>>  As a practical matter, I can't set $PLATFORM and/or $LIB in my rpath and
>> have ld.so do the right thing in this context.
>
> Can't Clang compile the sanitized executable with a special RPATH pointing
> to the correct libc++ folder?
>
>> Moreover, it is really a property of how you compiled, so I think using an
>> alternate library name is natural.
>
> Using an alternatively library names will likely cause problems if a
> non-sanitized libc++ is also present, since both libraries
> provide the exact same symbols it's possible that symbols in the
> non-sanitized libc++ will replace the sanitized versions.
>
>
>
>
> On Sun, Aug 14, 2016 at 7:31 PM, Hal Finkel <hfinkel at anl.gov> wrote:
>>
>> ----- Original Message -----
>> > From: "Jonathan Roelofs via cfe-dev" <cfe-dev at lists.llvm.org>
>> > To: "Eric Fiselier" <eric at efcs.ca>, "clang developer list"
>> > <cfe-dev at lists.llvm.org>, "Chandler Carruth"
>> > <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>, "Evgenii
>> > Stepanov" <eugenis at google.com>
>> > Sent: Sunday, August 14, 2016 7:07:00 PM
>> > Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a Sanitized
>> > Libc++
>> >
>> >
>> >
>> > On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
>> > > Sanitizers such as MSAN require the entire program to be
>> > > instrumented,
>> > > anything less leads to plenty of false positives. Unfortunately
>> > > this can
>> > > be difficult to achieve, especially for the C and C++ standard
>> > > libraries. To work around this the sanitizers provide interceptors
>> > > for
>> > > common C functions, but the same solution doesn't work as well for
>> > > the
>> > > C++ STL. Instead users are forced to manually build and link a
>> > > custom
>> > > sanitized libc++. This is a huge PITA and I would like to improve
>> > > the
>> > > situation, not just for MSAN but all sanitizers. I'm working on a
>> > > proposal to change this. The basis of my proposal is:
>> > >
>> > > Clang should install/provide multiple sanitized versions of Libc++
>> > > and a
>> > > mechanism to easily link them, as if they were a Compiler-RT
>> > > runtime.
>> > >
>> > > The goal of this proposal is:
>> > >
>> > > (1) Greatly reduce the number of false positives caused by using an
>> > > un-sanitized STL.
>> > > (2) Allow sanitizers to catch user bugs that occur within the STL
>> > > library, not just its headers.
>> > >
>> > > The basic steps I would like to take to achieve this are:
>> > >
>> > > (1) Teach the compiler-rt CMake how to build and install each
>> > > sanitized
>> > > libc++ version along side its other runtimes.
>> > > (2) Add options to the Clang driver to support linking/using these
>> > > libraries.
>> > >
>> > > I think this proposal is likely to be contentious, so I would like
>> > > to
>> > > focus on the details it. Once I have some feedback on these details
>> > > I'll
>> > > put together a formal proposal, including a plan for implementing
>> > > it.
>> > > The details I would like input on are:
>> > >
>> > > (A) What kind and how many sanitized versions of libc++ should we
>> > > provide?
>> > >
>> > > ---------------------------------------------------------------------------------------------------------------
>> > >
>> > > I think the minimum set would be Address (which includes Leak),
>> > > Memory
>> > > (With origin tracking?), Thread, and Undefined.
>> > > Once we get into combinations of sanitizers things get more
>> > > complicated.
>> > > What other sanitizer combinations should we provide?
>> > >
>> > > (B) How should we handle UBSAN?
>> > > ---------------------------------------------------
>> > >
>> > > UBSAN is really just a collection of sanitizers and providing
>> > > sanitized
>> > > versions of libc++ for every possible configuration is out of the
>> > > question.
>> > > Instead we should figure out what subset of UBSAN checks we want to
>> > > enable in sanitized libc++ versions. I suspect we want to disable
>> > > the
>> > > following checks.
>> > >
>> > > * -fsanitize=vptr
>> > > * -fsanitize=function
>> > > * -fsanitize=float-divide-by-zero
>> > >
>> > > Additionally UBSAN can be combined with every other sanitizer group
>> > > (ie
>> > > Address, Memory, Thread).
>> > > Do we want to provide a combination of UBSAN on/off for every
>> > > group, or
>> > > can we simply provide an over-sanitized version with UBSAN on?
>> > >
>> > > (C) How should the Clang driver expose the sanitized libraries to
>> > > the users?
>> > >
>> > > -------------------------------------------------------------------------------------------------------------
>> > >
>> > > I would like to propose the driver option '-fsanitize-stdlib' and
>> > > '-fsanitize-stdlib=<sanitizer>'.
>> > > The first version deduces the best sanitized version to use, the
>> > > second
>> > > allows it to be explicitly specified.
>> > >
>> > > A couple of other options are:
>> > >
>> > > * -fsanitize=foo:  Implicitly turn on a sanitized STL. Clang
>> > > deduces
>> > > which version.
>> > > * -stdlib=libc++-<sanitizer>: Explicitly turn on and choose a
>> > > sanitized STL.
>> > >
>> > > (D) Should sanitized libc++ versions override libc++.so?
>> > >
>> > > -------------------------------------------------------------------------------------------
>> > >
>> > > For example, what happens when a program links to both a sanitized
>> > > and
>> > > non-sanitized libc++ version?
>> > > Does the sanitized version replace the non-sanitized version, or
>> > > should
>> > > both versions be loaded into the program?
>> > >
>> > > Essentially I'm asking if the sanitized versions of libc++ should
>> > > have
>> > > the "soname" libc++ so they can
>> > > replace non-sanitized version, or if they should have a different
>> > > "soname" so the linker treats them as a separate library.
>> > >
>> > > I haven't looked into the consequences of either approach in depth,
>> > > but
>> > > any input is appreciated.
>> >
>> > In a sense, these are /just/ multilibs, so my inclination would be to
>> > make all the soname's the same, and just stick them in appropriately
>> > named subfolders relative to their normal location.
>>
>> I'm not sure that's true; there's no property of the environment that
>> determines which library path you need. As a practical matter, I can't set
>> $PLATFORM and/or $LIB in my rpath and have ld.so do the right thing in this
>> context. Moreover, it is really a property of how you compiled, so I think
>> using an alternate library name is natural.
>>
>>  -Hal
>>
>> >
>> >
>> > Jon
>> >
>> > >
>> > > Conclusion
>> > > -----------------
>> > >
>> > > I hope my proposal and questions have made sense. Any and all input
>> > > is
>> > > appreciated.
>> > > Please let me know if anything needs clarification.
>> > >
>> > > /Eric
>> > >
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > _______________________________________________
>> > > cfe-dev mailing list
>> > > cfe-dev at lists.llvm.org
>> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> > >
>> >
>> > --
>> > Jon Roelofs
>> > jonathan at codesourcery.com
>> > CodeSourcery / Mentor Embedded
>> > _______________________________________________
>> > cfe-dev mailing list
>> > cfe-dev at lists.llvm.org
>> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >
>>
>> --
>> Hal Finkel
>> Assistant Computational Scientist
>> Leadership Computing Facility
>> Argonne National Laboratory
>
>



More information about the cfe-dev mailing list