[cfe-dev] [ubsan] Add -fsanitize-warn-once, only emit runtime error once per check

Sun Dec 16 23:09:52 PST 2012

On Mon, Dec 17, 2012 at 11:06 AM, Alexey Samsonov <samsonov at google.com> wrote:
>> > wrote:
>> >>
>> >> Have you measured the code size overhead from this extra flag? Did you
>> >> consider implementing this in the runtime library instead (by
>> >> suppressing duplicates based on return address or SourceLocation)?
>> >
>>
>> Hmm, good point.  I hadn't previously measured this, but I just did
>> now, see attached.  Sizes are reported for text/data/bss/total as
>> reported by 'size', and ubsan calls are counted by occurrences of
>> calls in the actual resulting binary (grep'ing output from objdump
>> -d).
>>
>> Summary: average increase across CINT2006 was 1.85% using
>> -fsanitize=integer with a neatly consistent ~5bytes per check
>> (including text).
>>
>> For the ability to scale despite many sites triggering many times,
>> this does seem like a valid trade-off.
>>
>> Previously IOC used a short linear scan table (~10-20 elements was
>> sweet spot IIRC) with fallback to a larger hashtable to manage
>> duplicates, but that was always a performance issue.  As a useful data
>> point, a quick spot-check of 403.gcc shows 96 static locations
>> triggered a total of 3,476,066 times dynamically when just processing
>> one of the inputs used for the 'ref' input set (166.i).  More on this
>> below.
>>
>> >
>> > +1. ThreadSanitizer has to solve the same problem - we want to report
>> > each data race (pair of stack traces) exactly once. TSan runtime stores
>> > the
>> > stacks of printed reports (as a sequence of PCs) to do this
>> > de-duplication.
>> >
>>
>> Great, didn't realize TSan already solved this problem.  That said,
>> the problem is somewhat different I think:
>>
>> * TSan supports differentiating based on stack trace (apparently), but
>> that seems less interesting for ubsan/integer checks, especially since
>> we don't print that information :).  The byte-per-check approach
>> doesn't work for stack traces, so that's not really an option for tsan
>> as-is.
>> * I would (perhaps erroneously) expect tsan to have many fewer dynamic
>> invocations than ubsan/integer checks, which might suggest difference
>> trade-offs in the size vs performance department.  Checking a byte 1
>> million times vs scanning and managing a vector of ~100 items > 1
>> million times might make the size increase rather preferable, even if
>> that's not the right decision for tsan (I have no idea what's right
>> for tsan :)).
>
>
> TSan inserts a call to runtime library for each function entry/exit and
> for each memory operation (load/store). Still, data races don't happen that
> often, so we have to access hashtable which stores stack traces of
> printed reports on slow path only. We haven't observed any performance
> problems here (Dmitry may correct me if I'm wrong).

I turn it off only in tests.
There were no requests to turn it off. There were requests to make it
even more aggressive.

> I didn't extensively test ubsan on real-world applications so it's hard for
> me
> to estimate the number of error reports it prints. But I think we need to
> count
> not the number of calls to __ubsan_handle (i.e. number of places in code
> where
> an error _might_ happen), but the number of actual unique reports printed by
> ubsan.
> If, say, it's at most 10-20, then storing PCs of all the erroneous
> instructions and doing
> a linear scan before printing another report might be better than bloating
> the binary size by 1%.
>
> That said, I think that the de-duplication functionality should definitely
> be implemented one
> way or another, and it should be "on" by default (and I can't imagine a
> reason why a user
> may decide to turn it off).
>
>
>>
>>
>> Given this, and in light of the attached data, do you buy that this is
>> indeed the appropriate approach for ubsan/integer checking?
>>
>> Until I actually gathered this data I wasn't sure, but the 1-2% seems
>> very much worth it IMHO.
>>
>> Thoughts, and thanks for helping ensure we do what's right!
>>
>> ~Will
>>
>> >>
>> >> On Thu, Dec 13, 2012 at 11:51 AM, Will Dietz <willdtz at gmail.com> wrote:
>> >> > On Thu, Dec 13, 2012 at 12:05 PM, Dmitri Gribenko
>> >> > <gribozavr at gmail.com>
>> >> > wrote:
>> >> >> On Thu, Dec 13, 2012 at 7:33 PM, Will Dietz <willdtz at gmail.com>
>> >> >> wrote:
>> >> >>> This flag causes clang to emit a byte for each check that is used
>> >> >>> by
>> >> >>> the
>> >> >>> runtime to track whether we've already printed an error for that
>> >> >>> check.
>> >> >>>
>> >> >>>  Often failed checks are triggered many times dynamically, but a
>> >> >>> user
>> >> >>>  is only interested in which checks failed (with example dynamic
>> >> >>> values
>> >> >>>  to aid in debugging).  This flag lets the user make such runs much
>> >> >>>  more efficient and generate more manageable output.
>> >> >>
>> >> >> Hi Will,
>> >> >>
>> >> >> +  if (Checked) {
>> >> >> +    if (*Checked) return;
>> >> >> +    *Checked = true;
>> >> >> +  }
>> >> >>
>> >> >> Does it make sense to do the store atomically?  The user's program
>> >> >> is
>> >> >> already buggy, but introducing a possible data race is unfortunate.
>> >> >>
>> >> >> Dmitri
>> >> >>
>> >> >
>> >> > Hi Dmitri,
>> >> >
>> >> > Glad you brought this up.  I wasn't sure which way to go on this and
>> >> > erred on simplicity.  Attached is an updated compiler-rt patch using
>> >> > __sync_val_compare_and_swap, which also simplifies the code a bit.
>> >> > If
>> >> > this builtin is sufficiently portable (architectures and compiler
>> >> > recognition) then I would prefer this for the reasons you mention.
>> >> >
>> >> > Thanks!
>> >> >
>> >> > _______________________________________________
>> >> > cfe-dev mailing list
>> >> > cfe-dev at cs.uiuc.edu
>> >> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >> >
>> >> _______________________________________________
>> >> cfe-dev mailing list
>> >> cfe-dev at cs.uiuc.edu
>> >> http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev
>> >
>> >
>> >
>> >
>> > --
>> > Alexey Samsonov, MSK
>> >
>
>
>
>
> --
> Alexey Samsonov, MSK
>