[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++

Hal Finkel via cfe-dev cfe-dev at lists.llvm.org
Sun Aug 14 23:42:47 PDT 2016


----- Original Message -----
> From: "Evgenii Stepanov" <eugenis at google.com>
> To: "Eric Fiselier" <eric at efcs.ca>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Jonathan Roelofs" <jonathan at codesourcery.com>, "clang developer list"
> <cfe-dev at lists.llvm.org>, "Chandler Carruth" <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>
> Sent: Monday, August 15, 2016 12:46:39 AM
> Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++
> 
> Eric,
> 
> thanks for bringing this up! This is indeed one of the biggest issues
> for sanitizer adoption right now.
> 
> I think that the same-soname approach is correct, mainly because the
> sanitized library is just a version of the same library.

Given that, for the case of msan at least, some of the symbols have, effectively, additional semantic requirements, it seems appropriate to use a different name somewhere (i.e. the library name, symbol names).

> Loading both
> versions in one process would usually be an error.

While it would be undesirable to have multiple versions of libc++ in a single process, this generally works regardless. Most of the global state is in libcxxabi (or whatever ABI library is being used), and while having multiple versions of std::cout (etc.) can certainly be observable, it seems rare for this to come up in practice.
 
> RPATH does not work because if only affects immediate dependencies.
> The following would refer to two different versions of libc++:
> Executable (with asan) -> library A (without asan) -> libc++
>         |
>          -> libc++
> I think even in this case, if the two libc++'s have the same soname,
> only one will be loaded. Linux does breadth-first search, so it
> should
> end up with the direct dependency of the main executable, which is
> good.

Yes, I believe that Linux's loader does the right thing in this case. If you have the executable without asan, and the library with asan, then we should devise a scheme that does not silently break.

> 
> Another problem is what happens when the program is installed/copied
> somewhere, and the toolchain build directory is gone. We would need
> help from the dynamic loader.

This, IMHO, is a key problem with the rpath approach.
 
> We have something like this set up on Android for ASan, see
> https://source.android.com/devices/tech/debug/asan.html#sanitize_target
> The dynamic loader adds directories to the default library search
> path
> when it loads an instrumented executable. The directory with the ASan
> libraries is added at the start of the list. I think this is similar
> to how multilib works.
> 
> On Android we use the linker name itself (PT_INTERP field) to
> identify
> ASan executables. It would probably be better to use a .note section
> or even something else.
> 

Interesting. I don't understand what you're proposing here, however.

 -Hal

> 
> On Sun, Aug 14, 2016 at 7:14 PM, Eric Fiselier <eric at efcs.ca> wrote:
> >>  As a practical matter, I can't set $PLATFORM and/or $LIB in my
> >>  rpath and
> >> have ld.so do the right thing in this context.
> >
> > Can't Clang compile the sanitized executable with a special RPATH
> > pointing
> > to the correct libc++ folder?
> >
> >> Moreover, it is really a property of how you compiled, so I think
> >> using an
> >> alternate library name is natural.
> >
> > Using an alternatively library names will likely cause problems if
> > a
> > non-sanitized libc++ is also present, since both libraries
> > provide the exact same symbols it's possible that symbols in the
> > non-sanitized libc++ will replace the sanitized versions.
> >
> >
> >
> >
> > On Sun, Aug 14, 2016 at 7:31 PM, Hal Finkel <hfinkel at anl.gov>
> > wrote:
> >>
> >> ----- Original Message -----
> >> > From: "Jonathan Roelofs via cfe-dev" <cfe-dev at lists.llvm.org>
> >> > To: "Eric Fiselier" <eric at efcs.ca>, "clang developer list"
> >> > <cfe-dev at lists.llvm.org>, "Chandler Carruth"
> >> > <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>,
> >> > "Evgenii
> >> > Stepanov" <eugenis at google.com>
> >> > Sent: Sunday, August 14, 2016 7:07:00 PM
> >> > Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a
> >> > Sanitized
> >> > Libc++
> >> >
> >> >
> >> >
> >> > On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
> >> > > Sanitizers such as MSAN require the entire program to be
> >> > > instrumented,
> >> > > anything less leads to plenty of false positives.
> >> > > Unfortunately
> >> > > this can
> >> > > be difficult to achieve, especially for the C and C++ standard
> >> > > libraries. To work around this the sanitizers provide
> >> > > interceptors
> >> > > for
> >> > > common C functions, but the same solution doesn't work as well
> >> > > for
> >> > > the
> >> > > C++ STL. Instead users are forced to manually build and link a
> >> > > custom
> >> > > sanitized libc++. This is a huge PITA and I would like to
> >> > > improve
> >> > > the
> >> > > situation, not just for MSAN but all sanitizers. I'm working
> >> > > on a
> >> > > proposal to change this. The basis of my proposal is:
> >> > >
> >> > > Clang should install/provide multiple sanitized versions of
> >> > > Libc++
> >> > > and a
> >> > > mechanism to easily link them, as if they were a Compiler-RT
> >> > > runtime.
> >> > >
> >> > > The goal of this proposal is:
> >> > >
> >> > > (1) Greatly reduce the number of false positives caused by
> >> > > using an
> >> > > un-sanitized STL.
> >> > > (2) Allow sanitizers to catch user bugs that occur within the
> >> > > STL
> >> > > library, not just its headers.
> >> > >
> >> > > The basic steps I would like to take to achieve this are:
> >> > >
> >> > > (1) Teach the compiler-rt CMake how to build and install each
> >> > > sanitized
> >> > > libc++ version along side its other runtimes.
> >> > > (2) Add options to the Clang driver to support linking/using
> >> > > these
> >> > > libraries.
> >> > >
> >> > > I think this proposal is likely to be contentious, so I would
> >> > > like
> >> > > to
> >> > > focus on the details it. Once I have some feedback on these
> >> > > details
> >> > > I'll
> >> > > put together a formal proposal, including a plan for
> >> > > implementing
> >> > > it.
> >> > > The details I would like input on are:
> >> > >
> >> > > (A) What kind and how many sanitized versions of libc++ should
> >> > > we
> >> > > provide?
> >> > >
> >> > > ---------------------------------------------------------------------------------------------------------------
> >> > >
> >> > > I think the minimum set would be Address (which includes
> >> > > Leak),
> >> > > Memory
> >> > > (With origin tracking?), Thread, and Undefined.
> >> > > Once we get into combinations of sanitizers things get more
> >> > > complicated.
> >> > > What other sanitizer combinations should we provide?
> >> > >
> >> > > (B) How should we handle UBSAN?
> >> > > ---------------------------------------------------
> >> > >
> >> > > UBSAN is really just a collection of sanitizers and providing
> >> > > sanitized
> >> > > versions of libc++ for every possible configuration is out of
> >> > > the
> >> > > question.
> >> > > Instead we should figure out what subset of UBSAN checks we
> >> > > want to
> >> > > enable in sanitized libc++ versions. I suspect we want to
> >> > > disable
> >> > > the
> >> > > following checks.
> >> > >
> >> > > * -fsanitize=vptr
> >> > > * -fsanitize=function
> >> > > * -fsanitize=float-divide-by-zero
> >> > >
> >> > > Additionally UBSAN can be combined with every other sanitizer
> >> > > group
> >> > > (ie
> >> > > Address, Memory, Thread).
> >> > > Do we want to provide a combination of UBSAN on/off for every
> >> > > group, or
> >> > > can we simply provide an over-sanitized version with UBSAN on?
> >> > >
> >> > > (C) How should the Clang driver expose the sanitized libraries
> >> > > to
> >> > > the users?
> >> > >
> >> > > -------------------------------------------------------------------------------------------------------------
> >> > >
> >> > > I would like to propose the driver option '-fsanitize-stdlib'
> >> > > and
> >> > > '-fsanitize-stdlib=<sanitizer>'.
> >> > > The first version deduces the best sanitized version to use,
> >> > > the
> >> > > second
> >> > > allows it to be explicitly specified.
> >> > >
> >> > > A couple of other options are:
> >> > >
> >> > > * -fsanitize=foo:  Implicitly turn on a sanitized STL. Clang
> >> > > deduces
> >> > > which version.
> >> > > * -stdlib=libc++-<sanitizer>: Explicitly turn on and choose a
> >> > > sanitized STL.
> >> > >
> >> > > (D) Should sanitized libc++ versions override libc++.so?
> >> > >
> >> > > -------------------------------------------------------------------------------------------
> >> > >
> >> > > For example, what happens when a program links to both a
> >> > > sanitized
> >> > > and
> >> > > non-sanitized libc++ version?
> >> > > Does the sanitized version replace the non-sanitized version,
> >> > > or
> >> > > should
> >> > > both versions be loaded into the program?
> >> > >
> >> > > Essentially I'm asking if the sanitized versions of libc++
> >> > > should
> >> > > have
> >> > > the "soname" libc++ so they can
> >> > > replace non-sanitized version, or if they should have a
> >> > > different
> >> > > "soname" so the linker treats them as a separate library.
> >> > >
> >> > > I haven't looked into the consequences of either approach in
> >> > > depth,
> >> > > but
> >> > > any input is appreciated.
> >> >
> >> > In a sense, these are /just/ multilibs, so my inclination would
> >> > be to
> >> > make all the soname's the same, and just stick them in
> >> > appropriately
> >> > named subfolders relative to their normal location.
> >>
> >> I'm not sure that's true; there's no property of the environment
> >> that
> >> determines which library path you need. As a practical matter, I
> >> can't set
> >> $PLATFORM and/or $LIB in my rpath and have ld.so do the right
> >> thing in this
> >> context. Moreover, it is really a property of how you compiled, so
> >> I think
> >> using an alternate library name is natural.
> >>
> >>  -Hal
> >>
> >> >
> >> >
> >> > Jon
> >> >
> >> > >
> >> > > Conclusion
> >> > > -----------------
> >> > >
> >> > > I hope my proposal and questions have made sense. Any and all
> >> > > input
> >> > > is
> >> > > appreciated.
> >> > > Please let me know if anything needs clarification.
> >> > >
> >> > > /Eric
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > >
> >> > > _______________________________________________
> >> > > cfe-dev mailing list
> >> > > cfe-dev at lists.llvm.org
> >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >> > >
> >> >
> >> > --
> >> > Jon Roelofs
> >> > jonathan at codesourcery.com
> >> > CodeSourcery / Mentor Embedded
> >> > _______________________________________________
> >> > cfe-dev mailing list
> >> > cfe-dev at lists.llvm.org
> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >> >
> >>
> >> --
> >> Hal Finkel
> >> Assistant Computational Scientist
> >> Leadership Computing Facility
> >> Argonne National Laboratory
> >
> >
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the cfe-dev mailing list