[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++

Evgenii Stepanov via cfe-dev cfe-dev at lists.llvm.org
Mon Aug 15 11:21:27 PDT 2016


On Sun, Aug 14, 2016 at 11:42 PM, Hal Finkel <hfinkel at anl.gov> wrote:
> ----- Original Message -----
>> From: "Evgenii Stepanov" <eugenis at google.com>
>> To: "Eric Fiselier" <eric at efcs.ca>
>> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Jonathan Roelofs" <jonathan at codesourcery.com>, "clang developer list"
>> <cfe-dev at lists.llvm.org>, "Chandler Carruth" <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>
>> Sent: Monday, August 15, 2016 12:46:39 AM
>> Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++
>>
>> Eric,
>>
>> thanks for bringing this up! This is indeed one of the biggest issues
>> for sanitizer adoption right now.
>>
>> I think that the same-soname approach is correct, mainly because the
>> sanitized library is just a version of the same library.
>
> Given that, for the case of msan at least, some of the symbols have, effectively, additional semantic requirements, it seems appropriate to use a different name somewhere (i.e. the library name, symbol names).

That would mean enforcing that every library in a process is built
with MSan. This is the path DFSan takes by mangling symbol names
during instrumentation. I think MSan does not need to be as strict -
it would make it harder, not easier to use.

>
>> Loading both
>> versions in one process would usually be an error.
>
> While it would be undesirable to have multiple versions of libc++ in a single process, this generally works regardless. Most of the global state is in libcxxabi (or whatever ABI library is being used), and while having multiple versions of std::cout (etc.) can certainly be observable, it seems rare for this to come up in practice.

Good point. I was thinking about a general solution that can be
applied to a larger set of system libraries. Libc++ is the most
visible source of MSan false positives, but the problem is definitely
not limited to it. In general, it's hard to say if a random library is
ok to be loaded twice, and it feels like it is not ok in the majority
of cases.

>
>> RPATH does not work because if only affects immediate dependencies.
>> The following would refer to two different versions of libc++:
>> Executable (with asan) -> library A (without asan) -> libc++
>>         |
>>          -> libc++
>> I think even in this case, if the two libc++'s have the same soname,
>> only one will be loaded. Linux does breadth-first search, so it
>> should
>> end up with the direct dependency of the main executable, which is
>> good.
>
> Yes, I believe that Linux's loader does the right thing in this case. If you have the executable without asan, and the library with asan, then we should devise a scheme that does not silently break.
>
>>
>> Another problem is what happens when the program is installed/copied
>> somewhere, and the toolchain build directory is gone. We would need
>> help from the dynamic loader.
>
> This, IMHO, is a key problem with the rpath approach.
>
>> We have something like this set up on Android for ASan, see
>> https://source.android.com/devices/tech/debug/asan.html#sanitize_target
>> The dynamic loader adds directories to the default library search
>> path
>> when it loads an instrumented executable. The directory with the ASan
>> libraries is added at the start of the list. I think this is similar
>> to how multilib works.
>>
>> On Android we use the linker name itself (PT_INTERP field) to
>> identify
>> ASan executables. It would probably be better to use a .note section
>> or even something else.
>>
>
> Interesting. I don't understand what you're proposing here, however.
>
>  -Hal
>
>>
>> On Sun, Aug 14, 2016 at 7:14 PM, Eric Fiselier <eric at efcs.ca> wrote:
>> >>  As a practical matter, I can't set $PLATFORM and/or $LIB in my
>> >>  rpath and
>> >> have ld.so do the right thing in this context.
>> >
>> > Can't Clang compile the sanitized executable with a special RPATH
>> > pointing
>> > to the correct libc++ folder?
>> >
>> >> Moreover, it is really a property of how you compiled, so I think
>> >> using an
>> >> alternate library name is natural.
>> >
>> > Using an alternatively library names will likely cause problems if
>> > a
>> > non-sanitized libc++ is also present, since both libraries
>> > provide the exact same symbols it's possible that symbols in the
>> > non-sanitized libc++ will replace the sanitized versions.
>> >
>> >
>> >
>> >
>> > On Sun, Aug 14, 2016 at 7:31 PM, Hal Finkel <hfinkel at anl.gov>
>> > wrote:
>> >>
>> >> ----- Original Message -----
>> >> > From: "Jonathan Roelofs via cfe-dev" <cfe-dev at lists.llvm.org>
>> >> > To: "Eric Fiselier" <eric at efcs.ca>, "clang developer list"
>> >> > <cfe-dev at lists.llvm.org>, "Chandler Carruth"
>> >> > <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>,
>> >> > "Evgenii
>> >> > Stepanov" <eugenis at google.com>
>> >> > Sent: Sunday, August 14, 2016 7:07:00 PM
>> >> > Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a
>> >> > Sanitized
>> >> > Libc++
>> >> >
>> >> >
>> >> >
>> >> > On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
>> >> > > Sanitizers such as MSAN require the entire program to be
>> >> > > instrumented,
>> >> > > anything less leads to plenty of false positives.
>> >> > > Unfortunately
>> >> > > this can
>> >> > > be difficult to achieve, especially for the C and C++ standard
>> >> > > libraries. To work around this the sanitizers provide
>> >> > > interceptors
>> >> > > for
>> >> > > common C functions, but the same solution doesn't work as well
>> >> > > for
>> >> > > the
>> >> > > C++ STL. Instead users are forced to manually build and link a
>> >> > > custom
>> >> > > sanitized libc++. This is a huge PITA and I would like to
>> >> > > improve
>> >> > > the
>> >> > > situation, not just for MSAN but all sanitizers. I'm working
>> >> > > on a
>> >> > > proposal to change this. The basis of my proposal is:
>> >> > >
>> >> > > Clang should install/provide multiple sanitized versions of
>> >> > > Libc++
>> >> > > and a
>> >> > > mechanism to easily link them, as if they were a Compiler-RT
>> >> > > runtime.
>> >> > >
>> >> > > The goal of this proposal is:
>> >> > >
>> >> > > (1) Greatly reduce the number of false positives caused by
>> >> > > using an
>> >> > > un-sanitized STL.
>> >> > > (2) Allow sanitizers to catch user bugs that occur within the
>> >> > > STL
>> >> > > library, not just its headers.
>> >> > >
>> >> > > The basic steps I would like to take to achieve this are:
>> >> > >
>> >> > > (1) Teach the compiler-rt CMake how to build and install each
>> >> > > sanitized
>> >> > > libc++ version along side its other runtimes.
>> >> > > (2) Add options to the Clang driver to support linking/using
>> >> > > these
>> >> > > libraries.
>> >> > >
>> >> > > I think this proposal is likely to be contentious, so I would
>> >> > > like
>> >> > > to
>> >> > > focus on the details it. Once I have some feedback on these
>> >> > > details
>> >> > > I'll
>> >> > > put together a formal proposal, including a plan for
>> >> > > implementing
>> >> > > it.
>> >> > > The details I would like input on are:
>> >> > >
>> >> > > (A) What kind and how many sanitized versions of libc++ should
>> >> > > we
>> >> > > provide?
>> >> > >
>> >> > > ---------------------------------------------------------------------------------------------------------------
>> >> > >
>> >> > > I think the minimum set would be Address (which includes
>> >> > > Leak),
>> >> > > Memory
>> >> > > (With origin tracking?), Thread, and Undefined.
>> >> > > Once we get into combinations of sanitizers things get more
>> >> > > complicated.
>> >> > > What other sanitizer combinations should we provide?
>> >> > >
>> >> > > (B) How should we handle UBSAN?
>> >> > > ---------------------------------------------------
>> >> > >
>> >> > > UBSAN is really just a collection of sanitizers and providing
>> >> > > sanitized
>> >> > > versions of libc++ for every possible configuration is out of
>> >> > > the
>> >> > > question.
>> >> > > Instead we should figure out what subset of UBSAN checks we
>> >> > > want to
>> >> > > enable in sanitized libc++ versions. I suspect we want to
>> >> > > disable
>> >> > > the
>> >> > > following checks.
>> >> > >
>> >> > > * -fsanitize=vptr
>> >> > > * -fsanitize=function
>> >> > > * -fsanitize=float-divide-by-zero
>> >> > >
>> >> > > Additionally UBSAN can be combined with every other sanitizer
>> >> > > group
>> >> > > (ie
>> >> > > Address, Memory, Thread).
>> >> > > Do we want to provide a combination of UBSAN on/off for every
>> >> > > group, or
>> >> > > can we simply provide an over-sanitized version with UBSAN on?
>> >> > >
>> >> > > (C) How should the Clang driver expose the sanitized libraries
>> >> > > to
>> >> > > the users?
>> >> > >
>> >> > > -------------------------------------------------------------------------------------------------------------
>> >> > >
>> >> > > I would like to propose the driver option '-fsanitize-stdlib'
>> >> > > and
>> >> > > '-fsanitize-stdlib=<sanitizer>'.
>> >> > > The first version deduces the best sanitized version to use,
>> >> > > the
>> >> > > second
>> >> > > allows it to be explicitly specified.
>> >> > >
>> >> > > A couple of other options are:
>> >> > >
>> >> > > * -fsanitize=foo:  Implicitly turn on a sanitized STL. Clang
>> >> > > deduces
>> >> > > which version.
>> >> > > * -stdlib=libc++-<sanitizer>: Explicitly turn on and choose a
>> >> > > sanitized STL.
>> >> > >
>> >> > > (D) Should sanitized libc++ versions override libc++.so?
>> >> > >
>> >> > > -------------------------------------------------------------------------------------------
>> >> > >
>> >> > > For example, what happens when a program links to both a
>> >> > > sanitized
>> >> > > and
>> >> > > non-sanitized libc++ version?
>> >> > > Does the sanitized version replace the non-sanitized version,
>> >> > > or
>> >> > > should
>> >> > > both versions be loaded into the program?
>> >> > >
>> >> > > Essentially I'm asking if the sanitized versions of libc++
>> >> > > should
>> >> > > have
>> >> > > the "soname" libc++ so they can
>> >> > > replace non-sanitized version, or if they should have a
>> >> > > different
>> >> > > "soname" so the linker treats them as a separate library.
>> >> > >
>> >> > > I haven't looked into the consequences of either approach in
>> >> > > depth,
>> >> > > but
>> >> > > any input is appreciated.
>> >> >
>> >> > In a sense, these are /just/ multilibs, so my inclination would
>> >> > be to
>> >> > make all the soname's the same, and just stick them in
>> >> > appropriately
>> >> > named subfolders relative to their normal location.
>> >>
>> >> I'm not sure that's true; there's no property of the environment
>> >> that
>> >> determines which library path you need. As a practical matter, I
>> >> can't set
>> >> $PLATFORM and/or $LIB in my rpath and have ld.so do the right
>> >> thing in this
>> >> context. Moreover, it is really a property of how you compiled, so
>> >> I think
>> >> using an alternate library name is natural.
>> >>
>> >>  -Hal
>> >>
>> >> >
>> >> >
>> >> > Jon
>> >> >
>> >> > >
>> >> > > Conclusion
>> >> > > -----------------
>> >> > >
>> >> > > I hope my proposal and questions have made sense. Any and all
>> >> > > input
>> >> > > is
>> >> > > appreciated.
>> >> > > Please let me know if anything needs clarification.
>> >> > >
>> >> > > /Eric
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > >
>> >> > > _______________________________________________
>> >> > > cfe-dev mailing list
>> >> > > cfe-dev at lists.llvm.org
>> >> > > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >> > >
>> >> >
>> >> > --
>> >> > Jon Roelofs
>> >> > jonathan at codesourcery.com
>> >> > CodeSourcery / Mentor Embedded
>> >> > _______________________________________________
>> >> > cfe-dev mailing list
>> >> > cfe-dev at lists.llvm.org
>> >> > http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>> >> >
>> >>
>> >> --
>> >> Hal Finkel
>> >> Assistant Computational Scientist
>> >> Leadership Computing Facility
>> >> Argonne National Laboratory
>> >
>> >
>>
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory



More information about the cfe-dev mailing list