[cfe-dev] [RFC] Extending and improving Clang's undefined behavior checking

Wed Aug 15 02:56:40 PDT 2012

On Wed, Aug 15, 2012 at 11:47 AM, Kostya Serebryany <kcc at google.com> wrote:

>
>>>> 3) Handling and reporting of undefined behavior, once caught. The
>>>> sanitizers produce decent information (which is both useful and detailed),
>>>> but it could be prettier.
>>>>
>>>
>>> Can you be more specific?
>>>
>>
>> Sure. Here's a few things which I think could be improved:
>>
>> * The point at which the error is trapped doesn't include a column
>> number. (I'd also like a code snippet with a caret, where possible, but
>> there are tradeoffs to be carefully considered there...)
>>
>
> Agree. In some cases this could be very valuable, e.g. when reporting a
> bug on expressions like a->b->c = x->y->z;
> With your help we may be able to implement this as part of our symbolizer
> work.
>

FYI column numbers are already there in DWARF debug info, and we don't have
to do anything specific to fetch them. They are missing
in ASan error reports because addr2line doesn't output them.

>
>
>>
>> * Symbols don't get demangled (I know, this is WIP).
>>
>
> Yep.
>
>
>>
>> * The shadow byte output is currently only really useful if you
>> understand how ASAN works in a reasonable amount of detail. We don't
>> provide any explanation of what they mean in the output, and even the
>> documentation doesn't explain what the various redzone values mean. I think
>> we can do a lot better on this particular point; I gave some sample output
>> in my original message which shows the direction I'm thinking of.
>>
>
> Yea... The main reason we output the shadow bytes is to provide some means
> of debugging the tool itself if something goes wrong, especially if we have
> a report but don't have a way to reproduce it.
> Still, there were a few cases where the shadow bytes allowed us to
> understand a tricky true positive report quicker.
> Additional docs/explanations may help, but my experience that users don't
> read long docs :(
> So, my intention is to have less but better docs.
>
> Same for other bits of output (sp, bp, stats).
>
>
>>
>> * The 'stats' output does not seem relevant in the case where ASAN traps
>> an issue.
>>
>> * There seems to be a tension between producing extremely detailed output
>> (including things like bp and sp), which would be useful for a failure in a
>> long-running process, and producing readable output (omitting such things,
>> which are usually not interesting, and are redundant if the problem can be
>> reproduced in a debugger). I'm not really sure what the right tradeoff is
>> here. I certainly don't want to remove any useful information from the
>> output.
>>
>
> This tradeof is always tricky.
> If you omit some info by default, you may get into a situation where you
> need it, but you can't reproduce the report again.
> We've been bitten by this a lot :(
>
> --kcc
>
>
>>
>> In essence, I'd like the ASAN output to be as accessible as our
>> compile-time diagnostics. And yes, I am volunteering to help resolve the
>> above points, if you agree that we should improve these things.
>>
>>
>>> IOC is in some ways quite ambitious here, and provides options to not
>>> only diagnose a problem, but also to call a function which can produce a
>>> value for the operation and recover. I'm not proposing adding such a
>>> feature to Clang at this time (I'll leave that to the IOC folks).
>>>
>>>>
>>>> I would like to propose adding a runtime library for the
>>>> -fcatch-undefined-behavior checks, and emit calls to relevant diagnostic
>>>> functions instead of calls to @llvm.trap. I'm planning on having one
>>>> library entry point per flavor of check, named __ubsan_report_<check_name>,
>>>> which would be passed relevant operand values, and a pointer to a structure
>>>> containing information about the context (file, line and column number,
>>>> operand types as strings, and possibly a code snippet to display with the
>>>> diagnostic).
>>>>
>>>
>>> Do you plan to have these library functions to be marked as
>>> never-return? (if not, it will cost quite a bit of performance)
>>>
>>
>> Yes, that was certainly my initial plan (although John Regehr has
>> suggested that there may be value in being able to continue after such a
>> check failure). I would especially welcome your input into ways to emit the
>> checks in IR, and how to design the runtime library interface, to minimize
>> the performance burden of the checks, since I know you've done a lot of
>> research into that.
>>
>>
>>> Also note, that asan/tsan use the debug info to get file/line info. Do
>>> you plan to do the same, or you want to pass this info explicitly to the
>>> callback? (this will cost additional binary size and maybe some perf)
>>>
>>
>> I would like to preserve more information than can easily be extracted
>> from the debug info (in particular, line numbers and column ranges for
>> operands of problematic operations), so I'm intending on passing that info
>> explicitly to the callback, but naturally if the measured impact of this is
>> too high, I'll need to reconsider.
>>
>> I think the perf impact can be mitigated by forcing the check call out of
>> the hot path, either through just block placement, or by pushing the extra
>> arg emission into a separate noinline function, which would be called on a
>> check failure with a minimal set of arguments. Something like:
>>
>> int f(int x, int y) {
>>   return x << y;
>> }
>> =>
>> linkonce_odr unnamed_addr const char
>> __ubsan_str_8dfd1c41b32683bab10e108ac2851f2a = "  return x << y;\n";
>> __attribute__((noreturn, noinline)) static void
>> __ubsan_check_failure_1(int a, int b) {
>>   static const struct LocationInfo __ubsan_locs_1[] = {
>>     __FILE__, 2, 11, 12, // '<<'
>>     __FILE__, 2, 9, 9, // LHS
>>     __FILE__, 2, 14, 14 // RHS
>>   };
>>   __ubsan_report_left_shift_out_of_range(a, b, 32, "int", __ubsan_locs_1,
>> str_8dfd1c41b32683bab10e108ac2851f2a);
>> }
>> int f(int x, int y) {
>>   if (__builtin_expect((unsigned)y >= 32 || ((unsigned)x >> (31-y)), 0))
>>     __ubsan_check_failure_1(x, y);
>>   return x << y;
>> }
>>
>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>
>

-- 
Alexey Samsonov, MSK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20120815/602169ae/attachment.html>