[cfe-dev] [RFC] Extending and improving Clang's undefined behavior checking

Wed Aug 22 10:50:23 PDT 2012

On 8/22/12 12:04 PM, Chris Lattner wrote:
> On Aug 22, 2012, at 9:24 AM, John Criswell <criswell at illinois.edu> wrote:
>> On 8/21/12 6:33 PM, Chris Lattner wrote:
>>>> 1) Completeness of checks. There are integer undefined behaviors which -ftrapv and -fcatch-undefined-behavior don't catch, and there is almost no checking available for any other undefined behaviors outside of those and the ones caught by {Address,Thread,Memory} Sanitizer.
>>> Yes.  Nuno's bounds checking work can also be pulled into this eventually, as could stack canaries and "fortify source".
>> IMHO, I don't think ad-hoc techniques like stack canaries are suitable for this particular application.  Stack canaries are better suited for run-time protection against stack buffer-overflow attacks (if they're suited for anything at all).  I don't think canaries really tell you where in the code the stack is being smashed.
> The current implementation doesn't, but that's because its reporting mechanism is currently hard coded to "abort()".  It could (in principle, not saying this is important) be extended to report better source-level diagnostics.

You're right about the call to abort() and adding better diagnostics, 
but the problem goes deeper than that.  Most run-time checks (such as 
ASan's, SAFECode's, and SoftBound's) can usually tell the user *which* 
instruction is causing memory corruption and, in some cases, which 
variables are involved.  Stack canaries, I think, can only report which 
return instruction detected the corruption; it can't tell you which 
previously executed instruction actually caused the corruption.

In fact, I think most stack canaries will only detect run-time errors 
that smash the return pointer or function arguments, and such an error 
is likely to be visible even when the canaries are disabled, so I don't 
see it helping much as an undefined behavior diagnostic.  If you put 
canaries between stack objects, then that might detect otherwise latent 
errors.

Note that I'm making a distinction between run-time protection from 
attacks and detecting undefined behavior.  The former is a hardening 
technique to automatically make software resistant to attack; the latter 
is a debugging technique to help developers ferret out bugs. Stack 
canaries are a hardening technique, and they have some success in doing 
that.  I don't think they'd be very useful as a diagnostic technique for 
debugging.

>
>> For memory-related undefined behaviors, I think it would make sense to have various "levels" of checks in which each level adds more overhead but checks more things accurately.  ASan would be a good first or second level; it finds invalid loads and stores and can catch some out-of-bounds array accesses and dangling pointers. Another level could be ASan with SAFECode's array checks and points-to set checking.  A final level could be something like SoftBound + CETS which provides real dangling pointer detection in addition to the previously mentioned checks.
> Right.  Some would be on by default with -fcatch-undefined-behavior, some would be opt-inable with -fcatch-undefined-behavior=something-specific

Do you have any opinions on my "undefined behavior levels" idea?  In 
most cases, I don't think users want to fiddle around with a whole bunch 
of options for a whole bunch of checks that they don't understand; I 
think a dial that trades off run-time performance for more strict 
checking (in the same way that -O is a dial that trades off compile-time 
for faster code) is easier to use.

That said, having separate options for each type of check would be 
useful for experts, but a "default" set of checks and then having 
specify a whole bunch of separate options for additional checks might be 
unnecessarily cumbersome.

-- John T.