[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++

Eric Fiselier via cfe-dev cfe-dev at lists.llvm.org
Mon Aug 15 01:33:13 PDT 2016


On Mon, Aug 15, 2016 at 12:42 AM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "Evgenii Stepanov" <eugenis at google.com>
> > To: "Eric Fiselier" <eric at efcs.ca>
> > Cc: "Hal Finkel" <hfinkel at anl.gov>, "Jonathan Roelofs" <
> jonathan at codesourcery.com>, "clang developer list"
> > <cfe-dev at lists.llvm.org>, "Chandler Carruth" <chandlerc at gmail.com>,
> "Kostya Serebryany" <kcc at google.com>
> > Sent: Monday, August 15, 2016 12:46:39 AM
> > Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a Sanitized
> Libc++
> >
> > Eric,
> >
> > thanks for bringing this up! This is indeed one of the biggest issues
> > for sanitizer adoption right now.
> >
> > I think that the same-soname approach is correct, mainly because the
> > sanitized library is just a version of the same library.
>
> Given that, for the case of msan at least, some of the symbols have,
> effectively, additional semantic requirements, it seems appropriate to use
> a different name somewhere (i.e. the library name, symbol names).
>
> > Loading both
> > versions in one process would usually be an error.
>
> While it would be undesirable to have multiple versions of libc++ in a
> single process, this generally works regardless. Most of the global state
> is in libcxxabi (or whatever ABI library is being used), and while having
> multiple versions of std::cout (etc.) can certainly be observable, it seems
> rare for this to come up in practice.
>
> > RPATH does not work because if only affects immediate dependencies.
> > The following would refer to two different versions of libc++:
> > Executable (with asan) -> library A (without asan) -> libc++
> >         |
> >          -> libc++
> > I think even in this case, if the two libc++'s have the same soname,
> > only one will be loaded. Linux does breadth-first search, so it
> > should
> > end up with the direct dependency of the main executable, which is
> > good.
>
> Yes, I believe that Linux's loader does the right thing in this case. If
> you have the executable without asan, and the library with asan, then we
> should devise a scheme that does not silently break.
>
> >
> > Another problem is what happens when the program is installed/copied
> > somewhere, and the toolchain build directory is gone. We would need
> > help from the dynamic loader.
>
> This, IMHO, is a key problem with the rpath approach.
>

I don't think this will be an issue. Assuming the user has a system libc++
installed then the program
should simply fall back to that unsanitized version since it won't be able
to find the rpath. I don't
see anything more we could do.

One way to support this case would be to provide additional static versions
of libc++, since
statically linked executable's don't depend on the toolchain build
directory.

Did you have other fallback behavior in mind?
How does Compiler-rt handle this problem with shared sanitizer runtimes?


> > We have something like this set up on Android for ASan, see
> > https://source.android.com/devices/tech/debug/asan.html#sanitize_target
> > The dynamic loader adds directories to the default library search
> > path
> > when it loads an instrumented executable. The directory with the ASan
> > libraries is added at the start of the list. I think this is similar
> > to how multilib works.
> >
> > On Android we use the linker name itself (PT_INTERP field) to
> > identify
> > ASan executables. It would probably be better to use a .note section
> > or even something else.
> >
>
> Interesting. I don't understand what you're proposing here, however.
>
>  -Hal
>
> >
> > On Sun, Aug 14, 2016 at 7:14 PM, Eric Fiselier <eric at efcs.ca> wrote:
> > >>  As a practical matter, I can't set $PLATFORM and/or $LIB in my
> > >>  rpath and
> > >> have ld.so do the right thing in this context.
> > >
> > > Can't Clang compile the sanitized executable with a special RPATH
> > > pointing
> > > to the correct libc++ folder?
> > >
> > >> Moreover, it is really a property of how you compiled, so I think
> > >> using an
> > >> alternate library name is natural.
> > >
> > > Using an alternatively library names will likely cause problems if
> > > a
> > > non-sanitized libc++ is also present, since both libraries
> > > provide the exact same symbols it's possible that symbols in the
> > > non-sanitized libc++ will replace the sanitized versions.
> > >
> > >
> > >
> > >
> > > On Sun, Aug 14, 2016 at 7:31 PM, Hal Finkel <hfinkel at anl.gov>
> > > wrote:
> > >>
> > >> ----- Original Message -----
> > >> > From: "Jonathan Roelofs via cfe-dev" <cfe-dev at lists.llvm.org>
> > >> > To: "Eric Fiselier" <eric at efcs.ca>, "clang developer list"
> > >> > <cfe-dev at lists.llvm.org>, "Chandler Carruth"
> > >> > <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>,
> > >> > "Evgenii
> > >> > Stepanov" <eugenis at google.com>
> > >> > Sent: Sunday, August 14, 2016 7:07:00 PM
> > >> > Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a
> > >> > Sanitized
> > >> > Libc++
> > >> >
> > >> >
> > >> >
> > >> > On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
> > >> > > Sanitizers such as MSAN require the entire program to be
> > >> > > instrumented,
> > >> > > anything less leads to plenty of false positives.
> > >> > > Unfortunately
> > >> > > this can
> > >> > > be difficult to achieve, especially for the C and C++ standard
> > >> > > libraries. To work around this the sanitizers provide
> > >> > > interceptors
> > >> > > for
> > >> > > common C functions, but the same solution doesn't work as well
> > >> > > for
> > >> > > the
> > >> > > C++ STL. Instead users are forced to manually build and link a
> > >> > > custom
> > >> > > sanitized libc++. This is a huge PITA and I would like to
> > >> > > improve
> > >> > > the
> > >> > > situation, not just for MSAN but all sanitizers. I'm working
> > >> > > on a
> > >> > > proposal to change this. The basis of my proposal is:
> > >> > >
> > >> > > Clang should install/provide multiple sanitized versions of
> > >> > > Libc++
> > >> > > and a
> > >> > > mechanism to easily link them, as if they were a Compiler-RT
> > >> > > runtime.
> > >> > >
> > >> > > The goal of this proposal is:
> > >> > >
> > >> > > (1) Greatly reduce the number of false positives caused by
> > >> > > using an
> > >> > > un-sanitized STL.
> > >> > > (2) Allow sanitizers to catch user bugs that occur within the
> > >> > > STL
> > >> > > library, not just its headers.
> > >> > >
> > >> > > The basic steps I would like to take to achieve this are:
> > >> > >
> > >> > > (1) Teach the compiler-rt CMake how to build and install each
> > >> > > sanitized
> > >> > > libc++ version along side its other runtimes.
> > >> > > (2) Add options to the Clang driver to support linking/using
> > >> > > these
> > >> > > libraries.
> > >> > >
> > >> > > I think this proposal is likely to be contentious, so I would
> > >> > > like
> > >> > > to
> > >> > > focus on the details it. Once I have some feedback on these
> > >> > > details
> > >> > > I'll
> > >> > > put together a formal proposal, including a plan for
> > >> > > implementing
> > >> > > it.
> > >> > > The details I would like input on are:
> > >> > >
> > >> > > (A) What kind and how many sanitized versions of libc++ should
> > >> > > we
> > >> > > provide?
> > >> > >
> > >> > > ------------------------------------------------------------
> ---------------------------------------------------
> > >> > >
> > >> > > I think the minimum set would be Address (which includes
> > >> > > Leak),
> > >> > > Memory
> > >> > > (With origin tracking?), Thread, and Undefined.
> > >> > > Once we get into combinations of sanitizers things get more
> > >> > > complicated.
> > >> > > What other sanitizer combinations should we provide?
> > >> > >
> > >> > > (B) How should we handle UBSAN?
> > >> > > ---------------------------------------------------
> > >> > >
> > >> > > UBSAN is really just a collection of sanitizers and providing
> > >> > > sanitized
> > >> > > versions of libc++ for every possible configuration is out of
> > >> > > the
> > >> > > question.
> > >> > > Instead we should figure out what subset of UBSAN checks we
> > >> > > want to
> > >> > > enable in sanitized libc++ versions. I suspect we want to
> > >> > > disable
> > >> > > the
> > >> > > following checks.
> > >> > >
> > >> > > * -fsanitize=vptr
> > >> > > * -fsanitize=function
> > >> > > * -fsanitize=float-divide-by-zero
> > >> > >
> > >> > > Additionally UBSAN can be combined with every other sanitizer
> > >> > > group
> > >> > > (ie
> > >> > > Address, Memory, Thread).
> > >> > > Do we want to provide a combination of UBSAN on/off for every
> > >> > > group, or
> > >> > > can we simply provide an over-sanitized version with UBSAN on?
> > >> > >
> > >> > > (C) How should the Clang driver expose the sanitized libraries
> > >> > > to
> > >> > > the users?
> > >> > >
> > >> > > ------------------------------------------------------------
> -------------------------------------------------
> > >> > >
> > >> > > I would like to propose the driver option '-fsanitize-stdlib'
> > >> > > and
> > >> > > '-fsanitize-stdlib=<sanitizer>'.
> > >> > > The first version deduces the best sanitized version to use,
> > >> > > the
> > >> > > second
> > >> > > allows it to be explicitly specified.
> > >> > >
> > >> > > A couple of other options are:
> > >> > >
> > >> > > * -fsanitize=foo:  Implicitly turn on a sanitized STL. Clang
> > >> > > deduces
> > >> > > which version.
> > >> > > * -stdlib=libc++-<sanitizer>: Explicitly turn on and choose a
> > >> > > sanitized STL.
> > >> > >
> > >> > > (D) Should sanitized libc++ versions override libc++.so?
> > >> > >
> > >> > > ------------------------------------------------------------
> -------------------------------
> > >> > >
> > >> > > For example, what happens when a program links to both a
> > >> > > sanitized
> > >> > > and
> > >> > > non-sanitized libc++ version?
> > >> > > Does the sanitized version replace the non-sanitized version,
> > >> > > or
> > >> > > should
> > >> > > both versions be loaded into the program?
> > >> > >
> > >> > > Essentially I'm asking if the sanitized versions of libc++
> > >> > > should
> > >> > > have
> > >> > > the "soname" libc++ so they can
> > >> > > replace non-sanitized version, or if they should have a
> > >> > > different
> > >> > > "soname" so the linker treats them as a separate library.
> > >> > >
> > >> > > I haven't looked into the consequences of either approach in
> > >> > > depth,
> > >> > > but
> > >> > > any input is appreciated.
> > >> >
> > >> > In a sense, these are /just/ multilibs, so my inclination would
> > >> > be to
> > >> > make all the soname's the same, and just stick them in
> > >> > appropriately
> > >> > named subfolders relative to their normal location.
> > >>
> > >> I'm not sure that's true; there's no property of the environment
> > >> that
> > >> determines which library path you need. As a practical matter, I
> > >> can't set
> > >> $PLATFORM and/or $LIB in my rpath and have ld.so do the right
> > >> thing in this
> > >> context. Moreover, it is really a property of how you compiled, so
> > >> I think
> > >> using an alternate library name is natural.
> > >>
> > >>  -Hal
> > >>
> > >> >
> > >> >
> > >> > Jon
> > >> >
> > >> > >
> > >> > > Conclusion
> > >> > > -----------------
> > >> > >
> > >> > > I hope my proposal and questions have made sense. Any and all
> > >> > > input
> > >> > > is
> > >> > > appreciated.
> > >> > > Please let me know if anything needs clarification.
> > >> > >
> > >> > > /Eric
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > >
> > >> > > _______________________________________________
> > >> > > cfe-dev mailing list
> > >> > > cfe-dev at lists.llvm.org
> > >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > >> > >
> > >> >
> > >> > --
> > >> > Jon Roelofs
> > >> > jonathan at codesourcery.com
> > >> > CodeSourcery / Mentor Embedded
> > >> > _______________________________________________
> > >> > cfe-dev mailing list
> > >> > cfe-dev at lists.llvm.org
> > >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> > >> >
> > >>
> > >> --
> > >> Hal Finkel
> > >> Assistant Computational Scientist
> > >> Leadership Computing Facility
> > >> Argonne National Laboratory
> > >
> > >
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160815/89718171/attachment.html>


More information about the cfe-dev mailing list