[llvm-dev] RFC: Extend UBSan with qsort checks

Yury Gribov via llvm-dev llvm-dev at lists.llvm.org
Tue Jan 12 22:27:51 PST 2016


On 01/13/2016 01:57 AM, Alexey Samsonov wrote:
> (+correct cfe-dev list)
>
> On Tue, Jan 12, 2016 at 2:57 PM, Alexey Samsonov <vonosmas at gmail.com> wrote:
>
>> Hi Yuri,
>>
>> On Mon, Jan 11, 2016 at 9:53 AM, Yury Gribov via llvm-dev <
>> llvm-dev at lists.llvm.org> wrote:
>>
>>> Hi all,
>>>
>>> UndefinedBehaviorSanitizer currently does not check for undefined
>>> behaviors which result from improper usage of standard library functions.
>>>
>>
>> It's not really an undefined behavior per se: you just violate the
>> contract provided by library functions (at least I haven't found a clause
>> in C++ standard
>> that specifies that the behavior of smth. like std::sort() with invalid
>> operator< is invalid).

Well, C11 which says that "That is, for qsort they shall define a total 
ordering on the array, and for bsearch the same object shall always 
compare the same way with the key." ("total ordering" is of course a 
misnomer, AFAIK they meant weak ordering). AFAIK violation of "shall" 
usually means UB.

As for C++11, it has for e.g. srtd::sort:

"Requires: operator< (for the version with no arguments) or comp (for 
the version with a comparison argument) defines a strict weak ordering 
(25.4)."

which also sounds like UB.

> The actual behavior of these functions is
>> determined by library authors, and can be anything from UB to unspecified
>> result, to nicely formatted error report (if you're using a debug version
>> of system library):
>> I think I actually encountered all these cases.
>>
>> That said, the benefit of checking the sorting routines is clear: I'm not
>> surprised you were able to catch numerous bugs with your tool.
>>
>> UBSan currently doesn't have interceptors, and all of its checks [1] are
>> designed to work without runtime library support, so that it can be used as
>> a mitigation
>> tool (with -fsanitize-trap=undefined). It's challenging to add UBSan
>> interceptors at this point: this would limit the tool portability (at least
>> portability of some functionality),
>> and complicate interoperation of UBSan with another tools.

Could you give some hints about portability problems? E.g. I thought 
that ASan (which has a ton of interceptors) has no problems on Windows.

In general, not being able to detect UB in stdlibs sounds like a limitation.

 >> There is an
>> opportunity to, say, add qsort() interceptor to ASan/TSan/MSan, and check
>> arguments there.
>> This functionality can be controlled by a runtime flag.

It's certainly doable but somewhat "ugly" as [ATM]San are not about UB. 
On the other hand TSan already has non-thread-related checks anyway. I 
wonder what other folks think about this.

>> Why do you need compiler instrumentation? Do you want to automatically
>> inject hooks to SortChecker into std::sort() functions, so that you don't
>> need to annotate C++ standard library code?

Exactly, LD_PRELOAD does not work for inline STL functions (actually it 
does not fully work even for Glibc which has inline bsearch).

>> We submitted some annotations to libc++ code (e.g. to report containter
>> overflow bugs in sanitizers).

Annotating all popular stdlibs rather than doing a single wrapper would 
be a challenge. On the other hand, it may have an additional benefit of 
being extendable to other popular libraries which provide qsort-like 
APIs (Glib2, Berkeley DB and many others). E.g. we could have some kind 
of __attribute__((strict_order)) for callbacks which would then be 
handled by compiler.

>> [1] -fsanitize=vptr is an only notable exception
>>
>> One notorious instance of such errors is invalid usage of qsort or bsearch
>>> routines (or std::sort and friends in case of C++):
>>> * using comparison function that violates ordering axioms (reflexivity,
>>> symmetry, transitivity)
>>> * returning unstable results from comparison function
>>> * passing unsorted array to bsearch
>>> * etc.
>>>
>>> Errors like this will usually result in slightly invalid output (not
>>> fully sorted arrays, rare failed bsearches, etc.) but may as well cause
>>> aborts on some systems (https://bugzilla.samba.org/show_bug.cgi?id=3959).
>>>
>>> I've recently developed a simple proof-of-concept tool SortChecker (
>>> https://github.com/yugr/sortcheck) which intercepts calls to qsort and
>>> friends (via LD_PRELOAD) and performs various sanity checks before passing
>>> control to libc e.g.
>>> * sign(cmp(a, b)) == - sign(cmp(b, a)) for all array elements
>>> * etc.
>>>
>>> Results were quite inspiring: I've found several errors in popular
>>> open-source packages (GCC, Harfbuzz, dpkg, libXt, etc.). I'm also pretty
>>> sure most C++ developers have failed to write correct comparison function
>>> at least once.
>>>
>>> Could SortChecker functionality be considered for integration into UBSan?
>>> Extending SortChecker to C++ land would require significant work which has
>>> already been done in sanitizers (compiler instrumentation, portable
>>> parallel code, etc.). On the other hand, ability to detect new classes of
>>> bugs should benefit UBSan and community.
>>
>>
>>
>>>
>>> -Y
>>> _______________________________________________
>>> LLVM Developers mailing list
>>> llvm-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>
>>
>>
>>
>> --
>> Alexey Samsonov
>> vonosmas at gmail.com
>>
>
>
>



More information about the llvm-dev mailing list