[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++

Hal Finkel via cfe-dev cfe-dev at lists.llvm.org
Mon Aug 15 12:58:49 PDT 2016


----- Original Message -----
> From: "Eric Fiselier via cfe-dev" <cfe-dev at lists.llvm.org>
> To: "Ben Craig" <ben.craig at codeaurora.org>
> Cc: "clang developer list" <cfe-dev at lists.llvm.org>
> Sent: Monday, August 15, 2016 2:51:38 PM
> Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a Sanitized	Libc++
> 
> 
> 
> > Also, it would be awfully nice if -fsanitize=address were the only
> > flag necessary to add to the build to make everything work.
> > Requiring users to add another flag isn't particularly kind.
> 
> 
> It may not be particularly kind, but replacing your STL is something
> we might want consent to do. Especially if it has all these
> complexities.

I don't disagree. If we can't come up with a system that 'just works', then we should ask for consent (or at least provide an opt-out).

 -Hal

> 
> On Mon, Aug 15, 2016 at 1:47 PM, Craig, Ben via cfe-dev <
> cfe-dev at lists.llvm.org > wrote:
> 
> 
> 
> 
> 
> 
> On 8/15/2016 2:27 PM, Jonathan Roelofs via cfe-dev wrote:
> 
> 
> 
> 
> On 8/15/16 12:50 PM, Hal Finkel wrote:
> 
> 
> ----- Original Message -----
> 
> 
> From: "Jonathan Roelofs" < jonathan at codesourcery.com >
> To: "Evgenii Stepanov" < eugenis at google.com >
> Cc: "Hal Finkel" < hfinkel at anl.gov >, "Eric Fiselier" < eric at efcs.ca
> >, "clang developer list" < cfe-dev at lists.llvm.org >,
> "Chandler Carruth" < chandlerc at gmail.com >, "Kostya Serebryany" <
> kcc at google.com >
> Sent: Monday, August 15, 2016 1:37:11 PM
> Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a
> Sanitized Libc++
> 
> 
> 
> On 8/15/16 12:14 PM, Evgenii Stepanov wrote:
> 
> 
> On Mon, Aug 15, 2016 at 7:24 AM, Jonathan Roelofs
> < jonathan at codesourcery.com > wrote:
> 
> 
> 
> 
> On 8/14/16 7:31 PM, Hal Finkel wrote:
> 
> 
> 
> ----- Original Message -----
> 
> 
> 
> From: "Jonathan Roelofs via cfe-dev" < cfe-dev at lists.llvm.org >
> To:
> "Eric Fiselier" < eric at efcs.ca >, "clang developer list"
> < cfe-dev at lists.llvm.org >, "Chandler Carruth"
> < chandlerc at gmail.com >,
> "Kostya Serebryany" < kcc at google.com >, "Evgenii Stepanov"
> < eugenis at google.com > Sent: Sunday, August 14, 2016 7:07:00 PM
> Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a
> Sanitized Libc++
> 
> 
> 
> On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
> 
> 
> 
> Sanitizers such as MSAN require the entire program to be
> instrumented, anything less leads to plenty of false positives.
> Unfortunately this can be difficult to achieve, especially for
> the C and C++ standard libraries. To work around this the
> sanitizers provide interceptors for common C functions, but the
> same solution doesn't work as well for the C++ STL. Instead
> users
> are forced to manually build and link a custom sanitized
> libc++.
> This is a huge PITA and I would like to improve the situation,
> not just for MSAN but all sanitizers. I'm working on a proposal
> to change this. The basis of my proposal is:
> 
> Clang should install/provide multiple sanitized versions of
> Libc++ and a mechanism to easily link them, as if they were a
> Compiler-RT runtime.
> 
> The goal of this proposal is:
> 
> (1) Greatly reduce the number of false positives caused by
> using
> an un-sanitized STL. (2) Allow sanitizers to catch user bugs
> that
> occur within the STL library, not just its headers.
> 
> The basic steps I would like to take to achieve this are:
> 
> (1) Teach the compiler-rt CMake how to build and install each
> sanitized libc++ version along side its other runtimes. (2) Add
> options to the Clang driver to support linking/using these
> libraries.
> 
> I think this proposal is likely to be contentious, so I would
> like to focus on the details it. Once I have some feedback on
> these details I'll put together a formal proposal, including a
> plan for implementing it. The details I would like input on
> are:
> 
> (A) What kind and how many sanitized versions of libc++ should
> we provide?
> 
> ---------------------------------------------------------------------------------------------------------------
> 
> 
> 
> I think the minimum set would be Address (which includes Leak),
> 
> 
> 
> 
> 
> 
> 
> Memory (With origin tracking?), Thread, and Undefined. Once we
> get into combinations of sanitizers things get more
> complicated.
> What other sanitizer combinations should we provide?
> 
> (B) How should we handle UBSAN?
> ---------------------------------------------------
> 
> UBSAN is really just a collection of sanitizers and providing
> sanitized versions of libc++ for every possible configuration
> is
> out of the question. Instead we should figure out what subset
> of
> UBSAN checks we want to enable in sanitized libc++ versions. I
> suspect we want to disable the following checks.
> 
> * -fsanitize=vptr * -fsanitize=function *
> -fsanitize=float-divide-by-zero
> 
> Additionally UBSAN can be combined with every other sanitizer
> group (ie Address, Memory, Thread). Do we want to provide a
> combination of UBSAN on/off for every group, or can we simply
> provide an over-sanitized version with UBSAN on?
> 
> (C) How should the Clang driver expose the sanitized libraries
> to the users?
> 
> -------------------------------------------------------------------------------------------------------------
> 
> 
> 
> I would like to propose the driver option '-fsanitize-stdlib' and
> 
> 
> 
> 
> 
> 
> 
> '-fsanitize-stdlib=<sanitizer>'. The first version deduces the
> best sanitized version to use, the second allows it to be
> explicitly specified.
> 
> A couple of other options are:
> 
> * -fsanitize=foo: Implicitly turn on a sanitized STL. Clang
> deduces which version. * -stdlib=libc++-<sanitizer>: Explicitly
> turn on and choose a sanitized STL.
> 
> (D) Should sanitized libc++ versions override libc++.so?
> 
> -------------------------------------------------------------------------------------------
> 
> 
> 
> For example, what happens when a program links to both a sanitized
> 
> 
> 
> 
> 
> 
> 
> and non-sanitized libc++ version? Does the sanitized version
> replace the non-sanitized version, or should both versions be
> loaded into the program?
> 
> Essentially I'm asking if the sanitized versions of libc++
> should have the "soname" libc++ so they can replace
> non-sanitized
> version, or if they should have a different "soname" so the
> linker treats them as a separate library.
> 
> I haven't looked into the consequences of either approach in
> depth, but any input is appreciated.
> 
> 
> In a sense, these are /just/ multilibs, so my inclination would
> be
> to make all the soname's the same, and just stick them in
> appropriately named subfolders relative to their normal
> location.
> 
> 
> I'm not sure that's true; there's no property of the environment
> that
> determines which library path you need. As a practical matter, I
> can't set $PLATFORM and/or $LIB in my rpath and have ld.so do the
> right thing in this context. Moreover, it is really a property of
> how
> you compiled, so I think using an alternate library name is
> natural.
> 
> 
> Multilibs solve exactly the problem of "it's a property of how you
> compiled". The thing that's subtly different here is that the
> usual thing
> that people do with multilibs is to provide ABI incompatible
> versions of the
> same library (which are made incompatible via compiler flags,
> -msoft-float,
> for example), whereas these libraries just so happen to be ABI
> compatible
> with their non-instrumented variants.
> 
> I'm not sure I understand what you're saying about $PLATFORM and
> $LIB, but I
> /think/ it's a red herring: the compiler takes care of adding in
> the
> multilib suffixes where appropriate, so shouldn't the answer to
> "which
> library do I stick in the rpath?" include said suffix (when
> compiled with
> Eric's proposed flag)?
> 
> What are these suffixes and where are they added?
> 
> To be clear: the suffixes aren't something that exist yet, but rather
> they're something I'm proposing.
> 
> Strawman:
> 
> flag(s) suffix
> ------- ------
> -fsanitize=address /asan
> -fsanitize=address,memory /asan/msan
> 
> 
> Then with `-fsanitize=address`:
> 
> /usr/lib/libc++.so
> 
> becomes:
> 
> /usr/lib/asan/libc++.so
> 
> This kind of scheme sounds great, but is this something we can
> implement on our own, or something that requires changes to the
> dynamic loader (e.g. glibc's ld.so)?
> 
> Isn't it entirely up to what the user sticks in the rpath of the
> binaries that they build?
> It is my understanding that rpath only really helps with executables.
> If I want to build a dynamic library and sanitize it, without
> rebuilding my executable, then an rpath won't help.
> 
> Also, it would be awfully nice if -fsanitize=address were the only
> flag necessary to add to the build to make everything work.
> Requiring users to add another flag isn't particularly kind.
> 
> If we "only" need to provide sanitizers for libc++, and not
> libc++abi, then I would be more of a fan of providing a different
> .so name, along with a lot of version tagging. If version tagging
> isn't used, then having multiple libc++ versions in the same process
> would cause all sorts of interposition problems. I don't know how
> widely available version tagging is in practice though. GNU ld has
> it (
> http://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_node/ld_25.html ).
> 
> 
> 
> 
> 
> 
> 
> Jon
> 
> 
> 
> 
> -Hal
> 
> 
> 
> 
> And with `-fsanitize=memory`, you get:
> 
> /usr/lib/asan/msan/libc++.so
> 
> because an msan'd but not asan'd build of the library was not
> supplied
> by the vendor (for whatever hypothetical reason). Then the validation
> problem of having an exponential number of combinations to test
> becomes
> the vendor's problem: they can ship as many or as few of the flavors
> of
> the libraries as they want.
> 
> Here you'd have some notion of "satisfies the constraints the user
> asked
> for" (which is usually "is ABI compatible with" as far as normal
> multilib stuff goes) and another to rank the choices and break ties
> when
> all else is the same.
> 
> 
> 
> 
> Note that right now if I build with -stdlib=libc++ (and libc++ is
> part
> of llvm checkout), I don't get any RPATH. So the binary is linked
> against the libc++.so in the toolchain build directory, but it
> would
> not find it at runtime without some extra help. This is the price
> you
> pay for running out of temp location, and we should probably keep
> it
> like this for sanitizer builds, too, i.e. put the sanitized libc++
> in
> lib/msan and let the user set their own RPATH.
> 
> Yeah, that's my inclination also. We could of course provide some
> flag
> to support querying the compiler for what the sanitizer lib suffix is
> (or re-use/hijack the existing one for normal multilibs). That'd
> allow
> build scripts to append the suffix in a principled way.
> 
> 
> 
> 
> The other part of the problem is how to install sanitized libc++
> system-wide and have apps use it. That's where we need the loader
> support, and I think it should follow the multilib design as close
> as
> possible.
> 
> An idea for this: assuming they're all ABI compatible, stick them in
> their suffixed folders as appropriate, but add a symlink from the no
> suffix location to whatever one you want to be used system-wide.
> 
> 
> Jon
> 
> 
> 
> 
> 
> 
> 
> 
> Jon
> 
> 
> 
> 
> 
> -Hal
> 
> 
> 
> 
> 
> Jon
> 
> 
> 
> 
> Conclusion -----------------
> 
> I hope my proposal and questions have made sense. Any and all
> input is appreciated. Please let me know if anything needs
> clarification.
> 
> /Eric
> 
> 
> 
> 
> 
> 
> _______________________________________________ cfe-dev mailing
> list cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 
> 
> -- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor
> Embedded _______________________________________________ cfe-dev
> mailing list cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 
> 
> 
> --
> Jon Roelofs
> jonathan at codesourcery.com
> CodeSourcery / Mentor Embedded
> 
> --
> Jon Roelofs
> jonathan at codesourcery.com
> CodeSourcery / Mentor Embedded
> 
> 
> 
> 
> --
> Employee of Qualcomm Innovation Center, Inc.
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a
> Linux Foundation Collaborative Project
> 
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 
> 
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
> 

-- 
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory



More information about the cfe-dev mailing list