[cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++
Jonathan Roelofs via cfe-dev
cfe-dev at lists.llvm.org
Mon Aug 15 12:27:56 PDT 2016
On 8/15/16 12:50 PM, Hal Finkel wrote:
> ----- Original Message -----
>> From: "Jonathan Roelofs" <jonathan at codesourcery.com>
>> To: "Evgenii Stepanov" <eugenis at google.com>
>> Cc: "Hal Finkel" <hfinkel at anl.gov>, "Eric Fiselier" <eric at efcs.ca>, "clang developer list" <cfe-dev at lists.llvm.org>,
>> "Chandler Carruth" <chandlerc at gmail.com>, "Kostya Serebryany" <kcc at google.com>
>> Sent: Monday, August 15, 2016 1:37:11 PM
>> Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a Sanitized Libc++
>>
>>
>>
>> On 8/15/16 12:14 PM, Evgenii Stepanov wrote:
>>> On Mon, Aug 15, 2016 at 7:24 AM, Jonathan Roelofs
>>> <jonathan at codesourcery.com> wrote:
>>>>
>>>>
>>>> On 8/14/16 7:31 PM, Hal Finkel wrote:
>>>>>
>>>>> ----- Original Message -----
>>>>>>
>>>>>> From: "Jonathan Roelofs via cfe-dev" <cfe-dev at lists.llvm.org>
>>>>>> To:
>>>>>> "Eric Fiselier" <eric at efcs.ca>, "clang developer list"
>>>>>> <cfe-dev at lists.llvm.org>, "Chandler Carruth"
>>>>>> <chandlerc at gmail.com>,
>>>>>> "Kostya Serebryany" <kcc at google.com>, "Evgenii Stepanov"
>>>>>> <eugenis at google.com> Sent: Sunday, August 14, 2016 7:07:00 PM
>>>>>> Subject: Re: [cfe-dev] Making MSAN Easier to Use: Providing a
>>>>>> Sanitized Libc++
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 8/14/16 4:05 PM, Eric Fiselier via cfe-dev wrote:
>>>>>>>
>>>>>>> Sanitizers such as MSAN require the entire program to be
>>>>>>> instrumented, anything less leads to plenty of false positives.
>>>>>>> Unfortunately this can be difficult to achieve, especially for
>>>>>>> the C and C++ standard libraries. To work around this the
>>>>>>> sanitizers provide interceptors for common C functions, but the
>>>>>>> same solution doesn't work as well for the C++ STL. Instead
>>>>>>> users
>>>>>>> are forced to manually build and link a custom sanitized
>>>>>>> libc++.
>>>>>>> This is a huge PITA and I would like to improve the situation,
>>>>>>> not just for MSAN but all sanitizers. I'm working on a proposal
>>>>>>> to change this. The basis of my proposal is:
>>>>>>>
>>>>>>> Clang should install/provide multiple sanitized versions of
>>>>>>> Libc++ and a mechanism to easily link them, as if they were a
>>>>>>> Compiler-RT runtime.
>>>>>>>
>>>>>>> The goal of this proposal is:
>>>>>>>
>>>>>>> (1) Greatly reduce the number of false positives caused by
>>>>>>> using
>>>>>>> an un-sanitized STL. (2) Allow sanitizers to catch user bugs
>>>>>>> that
>>>>>>> occur within the STL library, not just its headers.
>>>>>>>
>>>>>>> The basic steps I would like to take to achieve this are:
>>>>>>>
>>>>>>> (1) Teach the compiler-rt CMake how to build and install each
>>>>>>> sanitized libc++ version along side its other runtimes. (2) Add
>>>>>>> options to the Clang driver to support linking/using these
>>>>>>> libraries.
>>>>>>>
>>>>>>> I think this proposal is likely to be contentious, so I would
>>>>>>> like to focus on the details it. Once I have some feedback on
>>>>>>> these details I'll put together a formal proposal, including a
>>>>>>> plan for implementing it. The details I would like input on
>>>>>>> are:
>>>>>>>
>>>>>>> (A) What kind and how many sanitized versions of libc++ should
>>>>>>> we provide?
>>>>>>>
>>>>>>> ---------------------------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>> I think the minimum set would be Address (which includes Leak),
>>>>>>>
>>>>>>> Memory (With origin tracking?), Thread, and Undefined. Once we
>>>>>>> get into combinations of sanitizers things get more
>>>>>>> complicated.
>>>>>>> What other sanitizer combinations should we provide?
>>>>>>>
>>>>>>> (B) How should we handle UBSAN?
>>>>>>> ---------------------------------------------------
>>>>>>>
>>>>>>> UBSAN is really just a collection of sanitizers and providing
>>>>>>> sanitized versions of libc++ for every possible configuration
>>>>>>> is
>>>>>>> out of the question. Instead we should figure out what subset
>>>>>>> of
>>>>>>> UBSAN checks we want to enable in sanitized libc++ versions. I
>>>>>>> suspect we want to disable the following checks.
>>>>>>>
>>>>>>> * -fsanitize=vptr * -fsanitize=function *
>>>>>>> -fsanitize=float-divide-by-zero
>>>>>>>
>>>>>>> Additionally UBSAN can be combined with every other sanitizer
>>>>>>> group (ie Address, Memory, Thread). Do we want to provide a
>>>>>>> combination of UBSAN on/off for every group, or can we simply
>>>>>>> provide an over-sanitized version with UBSAN on?
>>>>>>>
>>>>>>> (C) How should the Clang driver expose the sanitized libraries
>>>>>>> to the users?
>>>>>>>
>>>>>>> -------------------------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>> I would like to propose the driver option '-fsanitize-stdlib' and
>>>>>>>
>>>>>>> '-fsanitize-stdlib=<sanitizer>'. The first version deduces the
>>>>>>> best sanitized version to use, the second allows it to be
>>>>>>> explicitly specified.
>>>>>>>
>>>>>>> A couple of other options are:
>>>>>>>
>>>>>>> * -fsanitize=foo: Implicitly turn on a sanitized STL. Clang
>>>>>>> deduces which version. * -stdlib=libc++-<sanitizer>: Explicitly
>>>>>>> turn on and choose a sanitized STL.
>>>>>>>
>>>>>>> (D) Should sanitized libc++ versions override libc++.so?
>>>>>>>
>>>>>>> -------------------------------------------------------------------------------------------
>>>>>>>
>>>>>>>
>>>>>>>
>>>> For example, what happens when a program links to both a sanitized
>>>>>>>
>>>>>>> and non-sanitized libc++ version? Does the sanitized version
>>>>>>> replace the non-sanitized version, or should both versions be
>>>>>>> loaded into the program?
>>>>>>>
>>>>>>> Essentially I'm asking if the sanitized versions of libc++
>>>>>>> should have the "soname" libc++ so they can replace
>>>>>>> non-sanitized
>>>>>>> version, or if they should have a different "soname" so the
>>>>>>> linker treats them as a separate library.
>>>>>>>
>>>>>>> I haven't looked into the consequences of either approach in
>>>>>>> depth, but any input is appreciated.
>>>>>>
>>>>>>
>>>>>> In a sense, these are /just/ multilibs, so my inclination would
>>>>>> be
>>>>>> to make all the soname's the same, and just stick them in
>>>>>> appropriately named subfolders relative to their normal
>>>>>> location.
>>>>>
>>>>>
>>>>> I'm not sure that's true; there's no property of the environment
>>>>> that
>>>>> determines which library path you need. As a practical matter, I
>>>>> can't set $PLATFORM and/or $LIB in my rpath and have ld.so do the
>>>>> right thing in this context. Moreover, it is really a property of
>>>>> how
>>>>> you compiled, so I think using an alternate library name is
>>>>> natural.
>>>>
>>>>
>>>> Multilibs solve exactly the problem of "it's a property of how you
>>>> compiled". The thing that's subtly different here is that the
>>>> usual thing
>>>> that people do with multilibs is to provide ABI incompatible
>>>> versions of the
>>>> same library (which are made incompatible via compiler flags,
>>>> -msoft-float,
>>>> for example), whereas these libraries just so happen to be ABI
>>>> compatible
>>>> with their non-instrumented variants.
>>>>
>>>> I'm not sure I understand what you're saying about $PLATFORM and
>>>> $LIB, but I
>>>> /think/ it's a red herring: the compiler takes care of adding in
>>>> the
>>>> multilib suffixes where appropriate, so shouldn't the answer to
>>>> "which
>>>> library do I stick in the rpath?" include said suffix (when
>>>> compiled with
>>>> Eric's proposed flag)?
>>>
>>> What are these suffixes and where are they added?
>>
>> To be clear: the suffixes aren't something that exist yet, but rather
>> they're something I'm proposing.
>>
>> Strawman:
>>
>> flag(s) suffix
>> ------- ------
>> -fsanitize=address /asan
>> -fsanitize=address,memory /asan/msan
>>
>>
>> Then with `-fsanitize=address`:
>>
>> /usr/lib/libc++.so
>>
>> becomes:
>>
>> /usr/lib/asan/libc++.so
>
> This kind of scheme sounds great, but is this something we can implement on our own, or something that requires changes to the dynamic loader (e.g. glibc's ld.so)?
Isn't it entirely up to what the user sticks in the rpath of the
binaries that they build?
Jon
>
> -Hal
>
>>
>> And with `-fsanitize=memory`, you get:
>>
>> /usr/lib/asan/msan/libc++.so
>>
>> because an msan'd but not asan'd build of the library was not
>> supplied
>> by the vendor (for whatever hypothetical reason). Then the validation
>> problem of having an exponential number of combinations to test
>> becomes
>> the vendor's problem: they can ship as many or as few of the flavors
>> of
>> the libraries as they want.
>>
>> Here you'd have some notion of "satisfies the constraints the user
>> asked
>> for" (which is usually "is ABI compatible with" as far as normal
>> multilib stuff goes) and another to rank the choices and break ties
>> when
>> all else is the same.
>>
>>>
>>> Note that right now if I build with -stdlib=libc++ (and libc++ is
>>> part
>>> of llvm checkout), I don't get any RPATH. So the binary is linked
>>> against the libc++.so in the toolchain build directory, but it
>>> would
>>> not find it at runtime without some extra help. This is the price
>>> you
>>> pay for running out of temp location, and we should probably keep
>>> it
>>> like this for sanitizer builds, too, i.e. put the sanitized libc++
>>> in
>>> lib/msan and let the user set their own RPATH.
>>
>> Yeah, that's my inclination also. We could of course provide some
>> flag
>> to support querying the compiler for what the sanitizer lib suffix is
>> (or re-use/hijack the existing one for normal multilibs). That'd
>> allow
>> build scripts to append the suffix in a principled way.
>>
>>>
>>> The other part of the problem is how to install sanitized libc++
>>> system-wide and have apps use it. That's where we need the loader
>>> support, and I think it should follow the multilib design as close
>>> as
>>> possible.
>>
>> An idea for this: assuming they're all ABI compatible, stick them in
>> their suffixed folders as appropriate, but add a symlink from the no
>> suffix location to whatever one you want to be used system-wide.
>>
>>
>> Jon
>>
>>>
>>>>
>>>>
>>>> Jon
>>>>
>>>>
>>>>>
>>>>> -Hal
>>>>>
>>>>>>
>>>>>>
>>>>>> Jon
>>>>>>
>>>>>>>
>>>>>>> Conclusion -----------------
>>>>>>>
>>>>>>> I hope my proposal and questions have made sense. Any and all
>>>>>>> input is appreciated. Please let me know if anything needs
>>>>>>> clarification.
>>>>>>>
>>>>>>> /Eric
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________ cfe-dev mailing
>>>>>>> list cfe-dev at lists.llvm.org
>>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>>
>>>>>>
>>>>>> -- Jon Roelofs jonathan at codesourcery.com CodeSourcery / Mentor
>>>>>> Embedded _______________________________________________ cfe-dev
>>>>>> mailing list cfe-dev at lists.llvm.org
>>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Jon Roelofs
>>>> jonathan at codesourcery.com
>>>> CodeSourcery / Mentor Embedded
>>
>> --
>> Jon Roelofs
>> jonathan at codesourcery.com
>> CodeSourcery / Mentor Embedded
>>
>
--
Jon Roelofs
jonathan at codesourcery.com
CodeSourcery / Mentor Embedded
More information about the cfe-dev
mailing list