<div style="font-family: arial, helvetica, sans-serif; font-size: 10pt"><br><br><div class="gmail_quote">On Sat, Dec 15, 2012 at 4:48 AM, Will Dietz <span dir="ltr"><<a href="mailto:willdtz@gmail.com" target="_blank">willdtz@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Fri, Dec 14, 2012 at 1:53 AM, Alexey Samsonov <<a href="mailto:samsonov@google.com">samsonov@google.com</a>> wrote:<br>
><br>
><br>
> On Thu, Dec 13, 2012 at 6:26 PM, Richard Smith <<a href="mailto:richard@metafoo.co.uk">richard@metafoo.co.uk</a>><br>
> wrote:<br>
>><br>
>> Have you measured the code size overhead from this extra flag? Did you<br>
>> consider implementing this in the runtime library instead (by<br>
>> suppressing duplicates based on return address or SourceLocation)?<br>
><br>
<br>
</div>Hmm, good point. I hadn't previously measured this, but I just did<br>
now, see attached. Sizes are reported for text/data/bss/total as<br>
reported by 'size', and ubsan calls are counted by occurrences of<br>
calls in the actual resulting binary (grep'ing output from objdump<br>
-d).<br>
<br>
Summary: average increase across CINT2006 was 1.85% using<br>
-fsanitize=integer with a neatly consistent ~5bytes per check<br>
(including text).<br>
<br>
For the ability to scale despite many sites triggering many times,<br>
this does seem like a valid trade-off.<br>
<br>
Previously IOC used a short linear scan table (~10-20 elements was<br>
sweet spot IIRC) with fallback to a larger hashtable to manage<br>
duplicates, but that was always a performance issue. As a useful data<br>
point, a quick spot-check of 403.gcc shows 96 static locations<br>
triggered a total of 3,476,066 times dynamically when just processing<br>
one of the inputs used for the 'ref' input set (166.i). More on this<br>
below.<br>
<div class="im"><br>
><br>
> +1. ThreadSanitizer has to solve the same problem - we want to report<br>
> each data race (pair of stack traces) exactly once. TSan runtime stores the<br>
> stacks of printed reports (as a sequence of PCs) to do this de-duplication.<br>
><br>
<br>
</div>Great, didn't realize TSan already solved this problem. That said,<br>
the problem is somewhat different I think:<br>
<br>
* TSan supports differentiating based on stack trace (apparently), but<br>
that seems less interesting for ubsan/integer checks, especially since<br>
we don't print that information :). The byte-per-check approach<br>
doesn't work for stack traces, so that's not really an option for tsan<br>
as-is.<br>
* I would (perhaps erroneously) expect tsan to have many fewer dynamic<br>
invocations than ubsan/integer checks, which might suggest difference<br>
trade-offs in the size vs performance department. Checking a byte 1<br>
million times vs scanning and managing a vector of ~100 items > 1<br>
million times might make the size increase rather preferable, even if<br>
that's not the right decision for tsan (I have no idea what's right<br>
for tsan :)).<br></blockquote><div><br></div><div>TSan inserts a call to runtime library for each function entry/exit and</div><div>for each memory operation (load/store). Still, data races don't happen that</div><div>
often, so we have to access hashtable which stores stack traces of</div><div>printed reports on slow path only. We haven't observed any performance</div><div>problems here (Dmitry may correct me if I'm wrong).</div>
<div><br></div><div>I didn't extensively test ubsan on real-world applications so it's hard for me</div><div>to estimate the number of error reports it prints. But I think we need to count</div><div>not the number of calls to __ubsan_handle (i.e. number of places in code where</div>
<div>an error _might_ happen), but the number of actual unique reports printed by ubsan.</div><div>If, say, it's at most 10-20, then storing PCs of all the erroneous instructions and doing</div><div>a linear scan before printing another report might be better than bloating the binary size by 1%.</div>
<div><br></div><div>That said, I think that the de-duplication functionality should definitely be implemented one</div><div>way or another, and it should be "on" by default (and I can't imagine a reason why a user</div>
<div>may decide to turn it off).</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Given this, and in light of the attached data, do you buy that this is<br>
indeed the appropriate approach for ubsan/integer checking?<br>
<br>
Until I actually gathered this data I wasn't sure, but the 1-2% seems<br>
very much worth it IMHO.<br>
<br>
Thoughts, and thanks for helping ensure we do what's right!<br>
<span class="HOEnZb"><font color="#888888"><br>
~Will<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
>><br>
>> On Thu, Dec 13, 2012 at 11:51 AM, Will Dietz <<a href="mailto:willdtz@gmail.com">willdtz@gmail.com</a>> wrote:<br>
>> > On Thu, Dec 13, 2012 at 12:05 PM, Dmitri Gribenko <<a href="mailto:gribozavr@gmail.com">gribozavr@gmail.com</a>><br>
>> > wrote:<br>
>> >> On Thu, Dec 13, 2012 at 7:33 PM, Will Dietz <<a href="mailto:willdtz@gmail.com">willdtz@gmail.com</a>> wrote:<br>
>> >>> This flag causes clang to emit a byte for each check that is used by<br>
>> >>> the<br>
>> >>> runtime to track whether we've already printed an error for that<br>
>> >>> check.<br>
>> >>><br>
>> >>> Often failed checks are triggered many times dynamically, but a user<br>
>> >>> is only interested in which checks failed (with example dynamic<br>
>> >>> values<br>
>> >>> to aid in debugging). This flag lets the user make such runs much<br>
>> >>> more efficient and generate more manageable output.<br>
>> >><br>
>> >> Hi Will,<br>
>> >><br>
>> >> + if (Checked) {<br>
>> >> + if (*Checked) return;<br>
>> >> + *Checked = true;<br>
>> >> + }<br>
>> >><br>
>> >> Does it make sense to do the store atomically? The user's program is<br>
>> >> already buggy, but introducing a possible data race is unfortunate.<br>
>> >><br>
>> >> Dmitri<br>
>> >><br>
>> ><br>
>> > Hi Dmitri,<br>
>> ><br>
>> > Glad you brought this up. I wasn't sure which way to go on this and<br>
>> > erred on simplicity. Attached is an updated compiler-rt patch using<br>
>> > __sync_val_compare_and_swap, which also simplifies the code a bit. If<br>
>> > this builtin is sufficiently portable (architectures and compiler<br>
>> > recognition) then I would prefer this for the reasons you mention.<br>
>> ><br>
>> > Thanks!<br>
>> ><br>
>> > _______________________________________________<br>
>> > cfe-dev mailing list<br>
>> > <a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>
>> > <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>
>> ><br>
>> _______________________________________________<br>
>> cfe-dev mailing list<br>
>> <a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>
>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>
><br>
><br>
><br>
><br>
> --<br>
> Alexey Samsonov, MSK<br>
><br>
</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Alexey Samsonov, MSK</div><br>
</div>