[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++

Hal Finkel via cfe-dev cfe-dev at lists.llvm.org
Mon Aug 15 11:50:36 PDT 2016


----- Original Message -----
> From: "Jonathan Roelofs" <jonathan at codesourcery.com>
> To: "Evgenii Stepanov" <eugenis at google.com>
> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Eric Fiselier" <eric at efcs.ca>, "clang developer list" <cfe-dev at lists.llvm.org>,
> "Chandler Carruth" <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>
> Sent: Monday, August 15, 2016 1:37:11 PM
> Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++
> 
> 
> 
> On 8/15/16 12:14 PM, Evgenii Stepanov wrote:
> > On Mon, Aug 15, 2016 at 7:24 AM, Jonathan Roelofs
> > <jonathan at codesourcery.com> wrote:
> >>
> >>
> >> On 8/14/16 7:31 PM, Hal Finkel wrote:
> >>>
> >>> ----- Original Message -----
> >>>>
> >>>> From: "Jonathan Roelofs via cfe-dev" <cfe-dev at lists.llvm.org>
> >>>> To:
> >>>> "Eric Fiselier" <eric at efcs.ca>, "clang developer list"
> >>>> <cfe-dev at lists.llvm.org>, "Chandler Carruth"
> >>>> <chandlerc at gmail.com>,
> >>>> "Kostya Serebryany" <kcc at google.com>, "Evgenii Stepanov"
> >>>> <eugenis at google.com> Sent: Sunday, August 14, 2016 7:07:00 PM
> >>>> Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a
> >>>> Sanitized       Libc++
> >>>>
> >>>>
> >>>>
> >>>> On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
> >>>>>
> >>>>> Sanitizers such as MSAN require the entire program to be
> >>>>> instrumented, anything less leads to plenty of false positives.
> >>>>> Unfortunately this can be difficult to achieve, especially for
> >>>>> the C and C++ standard libraries. To work around this the
> >>>>> sanitizers provide interceptors for common C functions, but the
> >>>>> same solution doesn't work as well for the C++ STL. Instead
> >>>>> users
> >>>>> are forced to manually build and link a custom sanitized
> >>>>> libc++.
> >>>>> This is a huge PITA and I would like to improve the situation,
> >>>>> not just for MSAN but all sanitizers. I'm working on a proposal
> >>>>> to change this. The basis of my proposal is:
> >>>>>
> >>>>> Clang should install/provide multiple sanitized versions of
> >>>>> Libc++ and a mechanism to easily link them, as if they were a
> >>>>> Compiler-RT runtime.
> >>>>>
> >>>>> The goal of this proposal is:
> >>>>>
> >>>>> (1) Greatly reduce the number of false positives caused by
> >>>>> using
> >>>>> an un-sanitized STL. (2) Allow sanitizers to catch user bugs
> >>>>> that
> >>>>> occur within the STL library, not just its headers.
> >>>>>
> >>>>> The basic steps I would like to take to achieve this are:
> >>>>>
> >>>>> (1) Teach the compiler-rt CMake how to build and install each
> >>>>> sanitized libc++ version along side its other runtimes. (2) Add
> >>>>> options to the Clang driver to support linking/using these
> >>>>> libraries.
> >>>>>
> >>>>> I think this proposal is likely to be contentious, so I would
> >>>>> like to focus on the details it. Once I have some feedback on
> >>>>> these details I'll put together a formal proposal, including a
> >>>>> plan for implementing it. The details I would like input on
> >>>>> are:
> >>>>>
> >>>>> (A) What kind and how many sanitized versions of libc++ should
> >>>>> we provide?
> >>>>>
> >>>>> ---------------------------------------------------------------------------------------------------------------
> >>>>>
> >>>>>
> >>>>>
> >> I think the minimum set would be Address (which includes Leak),
> >>>>>
> >>>>> Memory (With origin tracking?), Thread, and Undefined. Once we
> >>>>> get into combinations of sanitizers things get more
> >>>>> complicated.
> >>>>> What other sanitizer combinations should we provide?
> >>>>>
> >>>>> (B) How should we handle UBSAN?
> >>>>> ---------------------------------------------------
> >>>>>
> >>>>> UBSAN is really just a collection of sanitizers and providing
> >>>>> sanitized versions of libc++ for every possible configuration
> >>>>> is
> >>>>> out of the question. Instead we should figure out what subset
> >>>>> of
> >>>>> UBSAN checks we want to enable in sanitized libc++ versions. I
> >>>>> suspect we want to disable the following checks.
> >>>>>
> >>>>> * -fsanitize=vptr * -fsanitize=function *
> >>>>> -fsanitize=float-divide-by-zero
> >>>>>
> >>>>> Additionally UBSAN can be combined with every other sanitizer
> >>>>> group (ie Address, Memory, Thread). Do we want to provide a
> >>>>> combination of UBSAN on/off for every group, or can we simply
> >>>>> provide an over-sanitized version with UBSAN on?
> >>>>>
> >>>>> (C) How should the Clang driver expose the sanitized libraries
> >>>>> to the users?
> >>>>>
> >>>>> -------------------------------------------------------------------------------------------------------------
> >>>>>
> >>>>>
> >>>>>
> >> I would like to propose the driver option '-fsanitize-stdlib' and
> >>>>>
> >>>>> '-fsanitize-stdlib=<sanitizer>'. The first version deduces the
> >>>>> best sanitized version to use, the second allows it to be
> >>>>> explicitly specified.
> >>>>>
> >>>>> A couple of other options are:
> >>>>>
> >>>>> * -fsanitize=foo:  Implicitly turn on a sanitized STL. Clang
> >>>>> deduces which version. * -stdlib=libc++-<sanitizer>: Explicitly
> >>>>> turn on and choose a sanitized STL.
> >>>>>
> >>>>> (D) Should sanitized libc++ versions override libc++.so?
> >>>>>
> >>>>> -------------------------------------------------------------------------------------------
> >>>>>
> >>>>>
> >>>>>
> >> For example, what happens when a program links to both a sanitized
> >>>>>
> >>>>> and non-sanitized libc++ version? Does the sanitized version
> >>>>> replace the non-sanitized version, or should both versions be
> >>>>> loaded into the program?
> >>>>>
> >>>>> Essentially I'm asking if the sanitized versions of libc++
> >>>>> should have the "soname" libc++ so they can replace
> >>>>> non-sanitized
> >>>>> version, or if they should have a different "soname" so the
> >>>>> linker treats them as a separate library.
> >>>>>
> >>>>> I haven't looked into the consequences of either approach in
> >>>>> depth, but any input is appreciated.
> >>>>
> >>>>
> >>>> In a sense, these are /just/ multilibs, so my inclination would
> >>>> be
> >>>> to make all the soname's the same, and just stick them in
> >>>> appropriately named subfolders relative to their normal
> >>>> location.
> >>>
> >>>
> >>> I'm not sure that's true; there's no property of the environment
> >>> that
> >>> determines which library path you need. As a practical matter, I
> >>> can't set $PLATFORM and/or $LIB in my rpath and have ld.so do the
> >>> right thing in this context. Moreover, it is really a property of
> >>> how
> >>> you compiled, so I think using an alternate library name is
> >>> natural.
> >>
> >>
> >> Multilibs solve exactly the problem of "it's a property of how you
> >> compiled". The thing that's subtly different here is that the
> >> usual thing
> >> that people do with multilibs is to provide ABI incompatible
> >> versions of the
> >> same library (which are made incompatible via compiler flags,
> >> -msoft-float,
> >> for example), whereas these libraries just so happen to be ABI
> >> compatible
> >> with their non-instrumented variants.
> >>
> >> I'm not sure I understand what you're saying about $PLATFORM and
> >> $LIB, but I
> >> /think/ it's a red herring: the compiler takes care of adding in
> >> the
> >> multilib suffixes where appropriate, so shouldn't the answer to
> >> "which
> >> library do I stick in the rpath?" include said suffix (when
> >> compiled with
> >> Eric's proposed flag)?
> >
> > What are these suffixes and where are they added?
> 
> To be clear: the suffixes aren't something that exist yet, but rather
> they're something I'm proposing.
> 
> Strawman:
> 
> flag(s)                         suffix
> -------                         ------
> -fsanitize=address              /asan
> -fsanitize=address,memory       /asan/msan
> 
> 
> Then with `-fsanitize=address`:
> 
>     /usr/lib/libc++.so
> 
> becomes:
> 
>     /usr/lib/asan/libc++.so

This kind of scheme sounds great, but is this something we can implement on our own, or something that requires changes to the dynamic loader (e.g. glibc's ld.so)?

 -Hal

> 
> And with `-fsanitize=memory`, you get:
> 
>     /usr/lib/asan/msan/libc++.so
> 
> because an msan'd but not asan'd build of the library was not
> supplied
> by the vendor (for whatever hypothetical reason). Then the validation
> problem of having an exponential number of combinations to test
> becomes
> the vendor's problem: they can ship as many or as few of the flavors
> of
> the libraries as they want.
> 
> Here you'd have some notion of "satisfies the constraints the user
> asked
> for" (which is usually "is ABI compatible with" as far as normal
> multilib stuff goes) and another to rank the choices and break ties
> when
> all else is the same.
> 
> >
> > Note that right now if I build with -stdlib=libc++ (and libc++ is
> > part
> > of llvm checkout), I don't get any RPATH. So the binary is linked
> > against the libc++.so in the toolchain build directory, but it
> > would
> > not find it at runtime without some extra help. This is the price
> > you
> > pay for running out of temp location, and we should probably keep
> > it
> > like this for sanitizer builds, too, i.e. put the sanitized libc++
> > in
> > lib/msan and let the user set their own RPATH.
> 
> Yeah, that's my inclination also. We could of course provide some
> flag
> to support querying the compiler for what the sanitizer lib suffix is
> (or re-use/hijack the existing one for normal multilibs). That'd
> allow
> build scripts to append the suffix in a principled way.
> 
> >
> > The other part of the problem is how to install sanitized libc++
> > system-wide and have apps use it. That's where we need the loader
> > support, and I think it should follow the multilib design as close
> > as
> > possible.
> 
> An idea for this: assuming they're all ABI compatible, stick them in
> their suffixed folders as appropriate, but add a symlink from the no
> suffix location to whatever one you want to be used system-wide.
> 
> 
> Jon
> 
> >
> >>
> >>
> >> Jon
> >>
> >>
> >>>
> >>> -Hal
> >>>
> >>>>
> >>>>
> >>>> Jon
> >>>>
> >>>>>
> >>>>> Conclusion -----------------
> >>>>>
> >>>>> I hope my proposal and questions have made sense. Any and all
> >>>>> input is appreciated. Please let me know if anything needs
> >>>>> clarification.
> >>>>>
> >>>>> /Eric
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________ cfe-dev mailing
> >>>>> list cfe-dev at lists.llvm.org
> >>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>>>>
> >>>>
> >>>> -- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor
> >>>> Embedded _______________________________________________ cfe-dev
> >>>> mailing list cfe-dev at lists.llvm.org
> >>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> >>>>
> >>>
> >>
> >> --
> >> Jon Roelofs
> >> jonathan at codesourcery.com
> >> CodeSourcery / Mentor Embedded
> 
> --
> Jon Roelofs
> jonathan at codesourcery.com
> CodeSourcery / Mentor Embedded
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the cfe-dev mailing list