[cfe-dev] [RFC] Extending and improving Clang's undefined behavior checking

Matthieu Monrocq matthieu.monrocq at gmail.com
Sat Aug 11 03:08:27 PDT 2012


On Sat, Aug 11, 2012 at 4:48 AM, Richard Smith <richard at metafoo.co.uk>wrote:

> Hi,
>
> There are three different (and mostly orthogonal, design-wise) areas where
> I would like to make improvements to Clang's undefined behavior checking:
>
> 1) Completeness of checks. There are integer undefined behaviors which
> -ftrapv and -fcatch-undefined-behavior don't catch, and there is almost no
> checking available for any other undefined behaviors outside of those and
> the ones caught by {Address,Thread,Memory} Sanitizer.
>
> I would like to add support for checking the following additional
> undefined behaviors, which Clang does not currently appear to be able to
> check for:
>
>  - Using a null pointer or empty glvalue (a dereferenced null pointer) as
> the object expression in a class member access expression or member
> function call.
>  - Using an object of polymorphic class type where either the static and
> dynamic types are incompatible or the object's lifetime has not started or
> has ended, in a delete-expression, class member access expression, member
> function call, derived-to-virtual-base conversion, dynamic_cast, or typeid,
> or using the value of a reference bound to such a glvalue.
>  - Base-to-derived cast on wrong polymorphic type
>  - Binding a reference to an inappropriately-aligned address or to an
> empty lvalue
>  - VLA size evaluates to a nonpositive value
>  - Reaching the end of a function with a non-void return type, in C++ (in
> C, this is only UB if the caller uses the return value)
>  - Overflow in conversion from floating point to smaller floating point or
> to integral type
>  - Overflow in conversion from integral type to floating point (int128 to
> float, or when converting to __fp16)
>  - Converting an out-of-range integer to an enumerated type
>  - Floating-point arithmetic overflow
>  - Pointer arithmetic which leaves the bounds which can be determined by
> llvm.objectsize
>  - Pointer subtraction between pointers which can be determined to be in
> different objects
>  - Signed << where the LHS is negative, or where the result doesn't fit in
> the result type (C99, C11) or doesn't fit in the corresponding unsigned
> type (C++11)
>  - Assignment where the LHS and RHS overlap but are not equal
>
> Regehr et al's IOC has code to handle a few of these checks already, which
> should be straightforward to apply to Clang trunk (assuming the IOC guys
> are still happy with that?). Low overhead would be an explicit goal of all
> of these checks.
>
>
> 2) Command-line interface. We currently have the following options to
> enable various flavors of undefined behavior checks:
>   -ftrapv
>   -fcatch-undefined-behavior
>   -faddress-sanitizer
>   -fthread-sanitizer
>
> I would like for us to have a single argument which enables all undefined
> behavior checking which can be performed cheaply and with no changes to
> address space layout or ABI; I propose we extend -fcatch-undefined-behavior
> to be that argument. IOC adds the -ftrapv checks to
> -fcatch-undefined-behavior (unless we're in -fwrapv mode), so we can easily
> pick up that part of this change.
>
> IOC also adds:
>   -fcatch-undefined-ansic-behavior
>   -fcatch-undefined-c99-behavior
>   -fcatch-undefined-cxx0x-behavior
>   -fcatch-undefined-cxx98-behavior
>   -fcatch-undefined-nonarith-behavior
>
> I think we should support this kind of configuration through a mechanism
> similar to warning flags: -fcatch-undefined-behavior=c++11
> -fno-catch-undefined-behavior=null-reference, for instance.
>
> Also, I think we should consider renaming this switch (and/or possibly the
> -f*-sanitizer switches) to provide a consistent set of names, but I don't
> have a concrete proposal for that.
>
>
> 3) Handling and reporting of undefined behavior, once caught. The
> sanitizers produce decent information (which is both useful and detailed),
> but it could be prettier. The other checks just emit @llvm.trap(), which is
> far from ideal.
>
> IOC is in some ways quite ambitious here, and provides options to not only
> diagnose a problem, but also to call a function which can produce a value
> for the operation and recover. I'm not proposing adding such a feature to
> Clang at this time (I'll leave that to the IOC folks).
>
> I would like to propose adding a runtime library for the
> -fcatch-undefined-behavior checks, and emit calls to relevant diagnostic
> functions instead of calls to @llvm.trap. I'm planning on having one
> library entry point per flavor of check, named __ubsan_report_<check_name>,
> which would be passed relevant operand values, and a pointer to a structure
> containing information about the context (file, line and column number,
> operand types as strings, and possibly a code snippet to display with the
> diagnostic).
>
> As a strawman proposal for runtime error output, we could aim to mirror
> the formatting style of compile-time diagnostics (since we've already got
> one way of beautifully presenting diagnostic information, why invent
> another?). For instance:
>
> $ clang++ bad.cc -o bad -fcatch-undefined-behavior
> $ ./bad
> *test.cc:4:12: runtime error: left shift count 33 >= width of type 'int'
> (32 bits)*
>   return i << n;
>            ^  ~
>               33
> *test.cc:7:30: note:* in call to 'f' here
> Illegal instruction
> $
>
> It would seems sensible to provide a common library for producing such
> diagnostic output and also use it in the sanitizers. Again, strawman output:
>
> */usr/include/c++/4.6.x-google/bits/stl_algobase.h:192:10: runtime error:global buffer overflow reading address 0x2b29a5e899f4
> *
>       if (__b < __a)
>           ^
> *<memory>:0x2b29a5e899f4: note:* contents of nearby memory:
>   ... e8 a5 29 2b 00 00  01 00 00 00 gg gg gg gg  gg gg gg gg gg gg gg gg
>  gg gg ...
>   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~~~~~~~~
>   ...ace)::GetBucket(int)::threshold read
> *<memory>:0x2b29a5e899e0: note:* read address is 0 bytes after global
> variable '(anonymous namespace)::GetBucket(int)::threshold'
>   ... gg gg gg gg  08 00 00 00 01 00 00 00  f4 99 e8 a5 29 2b 00 00  01 00
> 00 00 ...
>
>  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>                    (anonymous namespace)::GetBucket(int)::threshold
> *foo/bar.cc:863: note:* in call to 'std::min<int>(const int &, const int
> &)' here
> *foo/bar.cc:894: note:* in call to '(anonymous
> namespace)::GetBucket(int)' here
> [...]
> *gunit/gunit_main.cc:59: note:* in call to 'testing::UnitTest::Run()' here
> *<__libc_start_main>:0x2b29cb224d5d: note:* in call to 'main' here
>
>
> This proposal intends to incorporate some, but not all, of IOC into
> mainline Clang. The parts which I'm not currently intending to address as
> part of this work are:
>  * -fuse-intrinsic. This allows the choice of whether to use LLVM's
> overflow intrinsics. I intend to use such intrinsics unconditionally.
>  * -fuse-random-value. This allows recovery from detected undefined
> behavior by providing a value for the operation, which is not part of my
> proposal.
>  * -fhandler-null. This selects the usage of an alternate runtime library.
>  * -fchecks-num. This provides logging output when checks are inserted,
> and would not be valuable to my target audience.
>  * -fcatch-non-ubc-type. This is designed to catch unsigned integer
> overflow, and some other cases which don't have undefined behavior, and so
> is not directly connected to this work.
>
> Does this sound good?
> Richard
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>
This sounds positively awesome. And the explicit goal of low overhead is a
definite plus. I could perfectly imagine using those checks as parts of our
test builds at work if it turns out fast enough.

-- Matthieu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120811/a377fcba/attachment.html>


More information about the cfe-dev mailing list