[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++

Hal Finkel via cfe-dev cfe-dev at lists.llvm.org
Sun Aug 14 23:48:07 PDT 2016


----- Original Message -----
> From: "Hal Finkel via cfe-dev" <cfe-dev at lists.llvm.org>
> To: "Evgenii Stepanov" <eugenis at google.com>
> Cc: "Jonathan Roelofs" <jonathan at codesourcery.com>, "clang developer list" <cfe-dev at lists.llvm.org>
> Sent: Monday, August 15, 2016 1:42:47 AM
> Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++
> 
> ----- Original Message -----
> > From: "Evgenii Stepanov" <eugenis at google.com>
> > To: "Eric Fiselier" <eric at efcs.ca>
> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Jonathan Roelofs"
> > <jonathan at codesourcery.com>, "clang developer list"
> > <cfe-dev at lists.llvm.org>, "Chandler Carruth" <chandlerc at gmail.com>,
> > "Kostya Serebryany" <kcc at google.com>
> > Sent: Monday, August 15, 2016 12:46:39 AM
> > Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a
> > Sanitized Libc++
> > 
> > Eric,
> > 
> > thanks for bringing this up! This is indeed one of the biggest
> > issues
> > for sanitizer adoption right now.
> > 
> > I think that the same-soname approach is correct, mainly because
> > the
> > sanitized library is just a version of the same library.
> 
> Given that, for the case of msan at least, some of the symbols have,
> effectively, additional semantic requirements, it seems appropriate
> to use a different name somewhere (i.e. the library name, symbol
> names).
> 
> > Loading both
> > versions in one process would usually be an error.
> 
> While it would be undesirable to have multiple versions of libc++ in
> a single process, this generally works regardless. Most of the
> global state is in libcxxabi (or whatever ABI library is being
> used), and while having multiple versions of std::cout (etc.) can
> certainly be observable, it seems rare for this to come up in
> practice.

I'll add, however, that there are other libraries for which having multiply copies in the same process more-easily becomes a usability problem; the OpenMP runtime library is a good example.

 -Hal

>  
> > RPATH does not work because if only affects immediate dependencies.
> > The following would refer to two different versions of libc++:
> > Executable (with asan) -> library A (without asan) -> libc++
> >         |
> >          -> libc++
> > I think even in this case, if the two libc++'s have the same
> > soname,
> > only one will be loaded. Linux does breadth-first search, so it
> > should
> > end up with the direct dependency of the main executable, which is
> > good.
> 
> Yes, I believe that Linux's loader does the right thing in this case.
> If you have the executable without asan, and the library with asan,
> then we should devise a scheme that does not silently break.
> 
> > 
> > Another problem is what happens when the program is
> > installed/copied
> > somewhere, and the toolchain build directory is gone. We would need
> > help from the dynamic loader.
> 
> This, IMHO, is a key problem with the rpath approach.
>  
> > We have something like this set up on Android for ASan, see
> > https://source.android.com/devices/tech/debug/asan.html#sanitize_target
> > The dynamic loader adds directories to the default library search
> > path
> > when it loads an instrumented executable. The directory with the
> > ASan
> > libraries is added at the start of the list. I think this is
> > similar
> > to how multilib works.
> > 
> > On Android we use the linker name itself (PT_INTERP field) to
> > identify
> > ASan executables. It would probably be better to use a .note
> > section
> > or even something else.
> > 
> 
> Interesting. I don't understand what you're proposing here, however.
> 
>  -Hal
> 
> > 
> > On Sun, Aug 14, 2016 at 7:14 PM, Eric Fiselier <eric at efcs.ca>
> > wrote:
> > >>  As a practical matter, I can't set $PLATFORM and/or $LIB in my
> > >>  rpath and
> > >> have ld.so do the right thing in this context.
> > >
> > > Can't Clang compile the sanitized executable with a special RPATH
> > > pointing
> > > to the correct libc++ folder?
> > >
> > >> Moreover, it is really a property of how you compiled, so I
> > >> think
> > >> using an
> > >> alternate library name is natural.
> > >
> > > Using an alternatively library names will likely cause problems
> > > if
> > > a
> > > non-sanitized libc++ is also present, since both libraries
> > > provide the exact same symbols it's possible that symbols in the
> > > non-sanitized libc++ will replace the sanitized versions.
> > >
> > >
> > >
> > >
> > > On Sun, Aug 14, 2016 at 7:31 PM, Hal Finkel <hfinkel at anl.gov>
> > > wrote:
> > >>
> > >> ----- Original Message -----
> > >> > From: "Jonathan Roelofs via cfe-dev" <cfe-dev at lists.llvm.org>
> > >> > To: "Eric Fiselier" <eric at efcs.ca>, "clang developer list"
> > >> > <cfe-dev at lists.llvm.org>, "Chandler Carruth"
> > >> > <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>,
> > >> > "Evgenii
> > >> > Stepanov" <eugenis at google.com>
> > >> > Sent: Sunday, August 14, 2016 7:07:00 PM
> > >> > Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a
> > >> > Sanitized
> > >> > Libc++
> > >> >
> > >> >
> > >> >
> > >> > On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
> > >> > > Sanitizers such as MSAN require the entire program to be
> > >> > > instrumented,
> > >> > > anything less leads to plenty of false positives.
> > >> > > Unfortunately
> > >> > > this can
> > >> > > be difficult to achieve, especially for the C and C++
> > >> > > standard
> > >> > > libraries. To work around this the sanitizers provide
> > >> > > interceptors
> > >> > > for
> > >> > > common C functions, but the same solution doesn't work as
> > >> > > well
> > >> > > for
> > >> > > the
> > >> > > C++ STL. Instead users are forced to manually build and link
> > >> > > a
> > >> > > custom
> > >> > > sanitized libc++. This is a huge PITA and I would like to
> > >> > > improve
> > >> > > the
> > >> > > situation, not just for MSAN but all sanitizers. I'm working
> > >> > > on a
> > >> > > proposal to change this. The basis of my proposal is:
> > >> > >
> > >> > > Clang should install/provide multiple sanitized versions of
> > >> > > Libc++
> > >> > > and a
> > >> > > mechanism to easily link them, as if they were a Compiler-RT
> > >> > > runtime.
> > >> > >
> > >> > > The goal of this proposal is:
> > >> > >
> > >> > > (1) Greatly reduce the number of false positives caused by
> > >> > > using an
> > >> > > un-sanitized STL.
> > >> > > (2) Allow sanitizers to catch user bugs that occur within
> > >> > > the
> > >> > > STL
> > >> > > library, not just its headers.
> > >> > >
> > >> > > The basic steps I would like to take to achieve this are:
> > >> > >
> > >> > > (1) Teach the compiler-rt CMake how to build and install
> > >> > > each
> > >> > > sanitized
> > >> > > libc++ version along side its other runtimes.
> > >> > > (2) Add options to the Clang driver to support linking/using
> > >> > > these
> > >> > > libraries.
> > >> > >
> > >> > > I think this proposal is likely to be contentious, so I
> > >> > > would
> > >> > > like
> > >> > > to
> > >> > > focus on the details it. Once I have some feedback on these
> > >> > > details
> > >> > > I'll
> > >> > > put together a formal proposal, including a plan for
> > >> > > implementing
> > >> > > it.
> > >> > > The details I would like input on are:
> > >> > >
> > >> > > (A) What kind and how many sanitized versions of libc++
> > >> > > should
> > >> > > we
> > >> > > provide?
> > >> > >
> > >> > > ---------------------------------------------------------------------------------------------------------------
> > >> > >
> > >> > > I think the minimum set would be Address (which includes
> > >> > > Leak),
> > >> > > Memory
> > >> > > (With origin tracking?), Thread, and Undefined.
> > >> > > Once we get into combinations of sanitizers things get more
> > >> > > complicated.
> > >> > > What other sanitizer combinations should we provide?
> > >> > >
> > >> > > (B) How should we handle UBSAN?
> > >> > > ---------------------------------------------------
> > >> > >
> > >> > > UBSAN is really just a collection of sanitizers and
> > >> > > providing
> > >> > > sanitized
> > >> > > versions of libc++ for every possible configuration is out
> > >> > > of
> > >> > > the
> > >> > > question.
> > >> > > Instead we should figure out what subset of UBSAN checks we
> > >> > > want to
> > >> > > enable in sanitized libc++ versions. I suspect we want to
> > >> > > disable
> > >> > > the
> > >> > > following checks.
> > >> > >
> > >> > > * -fsanitize=vptr
> > >> > > * -fsanitize=function
> > >> > > * -fsanitize=float-divide-by-zero
> > >> > >
> > >> > > Additionally UBSAN can be combined with every other
> > >> > > sanitizer
> > >> > > group
> > >> > > (ie
> > >> > > Address, Memory, Thread).
> > >> > > Do we want to provide a combination of UBSAN on/off for
> > >> > > every
> > >> > > group, or
> > >> > > can we simply provide an over-sanitized version with UBSAN
> > >> > > on?
> > >> > >
> > >> > > (C) How should the Clang driver expose the sanitized
> > >> > > libraries
> > >> > > to
> > >> > > the users?
> > >> > >
> > >> > > -------------------------------------------------------------------------------------------------------------
> > >> > >
> > >> > > I would like to propose the driver option
> > >> > > '-fsanitize-stdlib'
> > >> > > and
> > >> > > '-fsanitize-stdlib=<sanitizer>'.
> > >> > > The first version deduces the best sanitized version to use,
> > >> > > the
> > >> > > second
> > >> > > allows it to be explicitly specified.
> > >> > >
> > >> > > A couple of other options are:
> > >> > >
> > >> > > * -fsanitize=foo:  Implicitly turn on a sanitized STL. Clang
> > >> > > deduces
> > >> > > which version.
> > >> > > * -stdlib=libc++-<sanitizer>: Explicitly turn on and choose
> > >> > > a
> > >> > > sanitized STL.
> > >> > >
> > >> > > (D) Should sanitized libc++ versions override libc++.so?
> > >> > >
> > >> > > -------------------------------------------------------------------------------------------
> > >> > >
> > >> > > For example, what happens when a program links to both a
> > >> > > sanitized
> > >> > > and
> > >> > > non-sanitized libc++ version?
> > >> > > Does the sanitized version replace the non-sanitized
> > >> > > version,
> > >> > > or
> > >> > > should
> > >> > > both versions be loaded into the program?
> > >> > >
> > >> > > Essentially I'm asking if the sanitized versions of libc++
> > >> > > should
> > >> > > have
> > >> > > the "soname" libc++ so they can
> > >> > > replace non-sanitized version, or if they should have a
> > >> > > different
> > >> > > "soname" so the linker treats them as a separate library.
> > >> > >
> > >> > > I haven't looked into the consequences of either approach in
> > >> > > depth,
> > >> > > but
> > >> > > any input is appreciated.
> > >> >
> > >> > In a sense, these are /just/ multilibs, so my inclination
> > >> > would
> > >> > be to
> > >> > make all the soname's the same, and just stick them in
> > >> > appropriately
> > >> > named subfolders relative to their normal location.
> > >>
> > >> I'm not sure that's true; there's no property of the environment
> > >> that
> > >> determines which library path you need. As a practical matter, I
> > >> can't set
> > >> $PLATFORM and/or $LIB in my rpath and have ld.so do the right
> > >> thing in this
> > >> context. Moreover, it is really a property of how you compiled,
> > >> so
> > >> I think
> > >> using an alternate library name is natural.
> > >>
> > >>  -Hal
> > >>
> > >> >
> > >> >
> > >> > Jon
> > >> >
> > >> > >
> > >> > > Conclusion
> > >> > > -----------------
> > >> > >
> > >> > > I hope my proposal and questions have made sense. Any and
> > >> > > all
> > >> > > input
> > >> > > is
> > >> > > appreciated.
> > >> > > Please let me know if anything needs clarification.
> > >> > >
> > >> > > /Eric
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > _______________________________________________
> > >> > > cfe-dev mailing list
> > >> > > cfe-dev at lists.llvm.org
> > >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > >> > >
> > >> >
> > >> > --
> > >> > Jon Roelofs
> > >> > jonathan at codesourcery.com
> > >> > CodeSourcery / Mentor Embedded
> > >> > _______________________________________________
> > >> > cfe-dev mailing list
> > >> > cfe-dev at lists.llvm.org
> > >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > >> >
> > >>
> > >> --
> > >> Hal Finkel
> > >> Assistant Computational Scientist
> > >> Leadership Computing Facility
> > >> Argonne National Laboratory
> > >
> > >
> > 
> 
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the cfe-dev mailing list