[cfe-dev] [RFC] Extending and improving Clang's undefined behavior checking

Ahmed Charles ahmedcharles at gmail.com
Sat Aug 11 05:11:18 PDT 2012


I like the idea. I would recommend a minor addition. I think it'd be
nice to have documentation that lists undefined behaviors, whether
they are supported or not and how to enable checking, but also
undefined behaviors that aren't being checked and ideally, a blurb on
what makes it prohibitive/impossible to check it.

On Sat, Aug 11, 2012 at 3:08 AM, Matthieu Monrocq
<matthieu.monrocq at gmail.com> wrote:
>
>
> On Sat, Aug 11, 2012 at 4:48 AM, Richard Smith <richard at metafoo.co.uk>
> wrote:
>>
>> Hi,
>>
>> There are three different (and mostly orthogonal, design-wise) areas where
>> I would like to make improvements to Clang's undefined behavior checking:
>>
>> 1) Completeness of checks. There are integer undefined behaviors which
>> -ftrapv and -fcatch-undefined-behavior don't catch, and there is almost no
>> checking available for any other undefined behaviors outside of those and
>> the ones caught by {Address,Thread,Memory} Sanitizer.
>>
>> I would like to add support for checking the following additional
>> undefined behaviors, which Clang does not currently appear to be able to
>> check for:
>>
>>  - Using a null pointer or empty glvalue (a dereferenced null pointer) as
>> the object expression in a class member access expression or member function
>> call.
>>  - Using an object of polymorphic class type where either the static and
>> dynamic types are incompatible or the object's lifetime has not started or
>> has ended, in a delete-expression, class member access expression, member
>> function call, derived-to-virtual-base conversion, dynamic_cast, or typeid,
>> or using the value of a reference bound to such a glvalue.
>>  - Base-to-derived cast on wrong polymorphic type
>>  - Binding a reference to an inappropriately-aligned address or to an
>> empty lvalue
>>  - VLA size evaluates to a nonpositive value
>>  - Reaching the end of a function with a non-void return type, in C++ (in
>> C, this is only UB if the caller uses the return value)
>>  - Overflow in conversion from floating point to smaller floating point or
>> to integral type
>>  - Overflow in conversion from integral type to floating point (int128 to
>> float, or when converting to __fp16)
>>  - Converting an out-of-range integer to an enumerated type
>>  - Floating-point arithmetic overflow
>>  - Pointer arithmetic which leaves the bounds which can be determined by
>> llvm.objectsize
>>  - Pointer subtraction between pointers which can be determined to be in
>> different objects
>>  - Signed << where the LHS is negative, or where the result doesn't fit in
>> the result type (C99, C11) or doesn't fit in the corresponding unsigned type
>> (C++11)
>>  - Assignment where the LHS and RHS overlap but are not equal
>>
>> Regehr et al's IOC has code to handle a few of these checks already, which
>> should be straightforward to apply to Clang trunk (assuming the IOC guys are
>> still happy with that?). Low overhead would be an explicit goal of all of
>> these checks.
>>
>>
>> 2) Command-line interface. We currently have the following options to
>> enable various flavors of undefined behavior checks:
>>   -ftrapv
>>   -fcatch-undefined-behavior
>>   -faddress-sanitizer
>>   -fthread-sanitizer
>>
>> I would like for us to have a single argument which enables all undefined
>> behavior checking which can be performed cheaply and with no changes to
>> address space layout or ABI; I propose we extend -fcatch-undefined-behavior
>> to be that argument. IOC adds the -ftrapv checks to
>> -fcatch-undefined-behavior (unless we're in -fwrapv mode), so we can easily
>> pick up that part of this change.
>>
>> IOC also adds:
>>   -fcatch-undefined-ansic-behavior
>>   -fcatch-undefined-c99-behavior
>>   -fcatch-undefined-cxx0x-behavior
>>   -fcatch-undefined-cxx98-behavior
>>   -fcatch-undefined-nonarith-behavior
>>
>> I think we should support this kind of configuration through a mechanism
>> similar to warning flags: -fcatch-undefined-behavior=c++11
>> -fno-catch-undefined-behavior=null-reference, for instance.
>>
>> Also, I think we should consider renaming this switch (and/or possibly the
>> -f*-sanitizer switches) to provide a consistent set of names, but I don't
>> have a concrete proposal for that.
>>
>>
>> 3) Handling and reporting of undefined behavior, once caught. The
>> sanitizers produce decent information (which is both useful and detailed),
>> but it could be prettier. The other checks just emit @llvm.trap(), which is
>> far from ideal.
>>
>> IOC is in some ways quite ambitious here, and provides options to not only
>> diagnose a problem, but also to call a function which can produce a value
>> for the operation and recover. I'm not proposing adding such a feature to
>> Clang at this time (I'll leave that to the IOC folks).
>>
>> I would like to propose adding a runtime library for the
>> -fcatch-undefined-behavior checks, and emit calls to relevant diagnostic
>> functions instead of calls to @llvm.trap. I'm planning on having one library
>> entry point per flavor of check, named __ubsan_report_<check_name>, which
>> would be passed relevant operand values, and a pointer to a structure
>> containing information about the context (file, line and column number,
>> operand types as strings, and possibly a code snippet to display with the
>> diagnostic).
>>
>> As a strawman proposal for runtime error output, we could aim to mirror
>> the formatting style of compile-time diagnostics (since we've already got
>> one way of beautifully presenting diagnostic information, why invent
>> another?). For instance:
>>
>> $ clang++ bad.cc -o bad -fcatch-undefined-behavior
>> $ ./bad
>> test.cc:4:12: runtime error: left shift count 33 >= width of type 'int'
>> (32 bits)
>>   return i << n;
>>            ^  ~
>>               33
>> test.cc:7:30: note: in call to 'f' here
>> Illegal instruction
>> $
>>
>> It would seems sensible to provide a common library for producing such
>> diagnostic output and also use it in the sanitizers. Again, strawman output:
>>
>> /usr/include/c++/4.6.x-google/bits/stl_algobase.h:192:10: runtime error:
>> global buffer overflow reading address 0x2b29a5e899f4
>>       if (__b < __a)
>>           ^
>> <memory>:0x2b29a5e899f4: note: contents of nearby memory:
>>   ... e8 a5 29 2b 00 00  01 00 00 00 gg gg gg gg  gg gg gg gg gg gg gg gg
>> gg gg ...
>>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~
>>   ...ace)::GetBucket(int)::threshold read
>> <memory>:0x2b29a5e899e0: note: read address is 0 bytes after global
>> variable '(anonymous namespace)::GetBucket(int)::threshold'
>>   ... gg gg gg gg  08 00 00 00 01 00 00 00  f4 99 e8 a5 29 2b 00 00  01 00
>> 00 00 ...
>>
>> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>                    (anonymous namespace)::GetBucket(int)::threshold
>> foo/bar.cc:863: note: in call to 'std::min<int>(const int &, const int &)'
>> here
>> foo/bar.cc:894: note: in call to '(anonymous namespace)::GetBucket(int)'
>> here
>> [...]
>> gunit/gunit_main.cc:59: note: in call to 'testing::UnitTest::Run()' here
>> <__libc_start_main>:0x2b29cb224d5d: note: in call to 'main' here
>>
>>
>> This proposal intends to incorporate some, but not all, of IOC into
>> mainline Clang. The parts which I'm not currently intending to address as
>> part of this work are:
>>  * -fuse-intrinsic. This allows the choice of whether to use LLVM's
>> overflow intrinsics. I intend to use such intrinsics unconditionally.
>>  * -fuse-random-value. This allows recovery from detected undefined
>> behavior by providing a value for the operation, which is not part of my
>> proposal.
>>  * -fhandler-null. This selects the usage of an alternate runtime library.
>>  * -fchecks-num. This provides logging output when checks are inserted,
>> and would not be valuable to my target audience.
>>  * -fcatch-non-ubc-type. This is designed to catch unsigned integer
>> overflow, and some other cases which don't have undefined behavior, and so
>> is not directly connected to this work.
>>
>> Does this sound good?
>> Richard
>>
>> _______________________________________________
>> cfe-dev mailing list
>> cfe-dev at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>>
>
> This sounds positively awesome. And the explicit goal of low overhead is a
> definite plus. I could perfectly imagine using those checks as parts of our
> test builds at work if it turns out fast enough.
>
> -- Matthieu
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>



-- 
Ahmed Charles



More information about the cfe-dev mailing list