<div style="font-family: arial, helvetica, sans-serif; font-size: 10pt"><br><br><div class="gmail_quote">On Sat, Dec 15, 2012 at 4:48 AM, Will Dietz <span dir="ltr"><<a href="mailto:willdtz@gmail.com" target="_blank">willdtz@gmail.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Fri, Dec 14, 2012 at 1:53 AM, Alexey Samsonov <<a href="mailto:samsonov@google.com">samsonov@google.com</a>> wrote:<br>


><br>

><br>

> On Thu, Dec 13, 2012 at 6:26 PM, Richard Smith <<a href="mailto:richard@metafoo.co.uk">richard@metafoo.co.uk</a>><br>

> wrote:<br>

>><br>

>> Have you measured the code size overhead from this extra flag? Did you<br>

>> consider implementing this in the runtime library instead (by<br>

>> suppressing duplicates based on return address or SourceLocation)?<br>

><br>

<br>

</div>Hmm, good point.  I hadn't previously measured this, but I just did<br>

now, see attached.  Sizes are reported for text/data/bss/total as<br>

reported by 'size', and ubsan calls are counted by occurrences of<br>

calls in the actual resulting binary (grep'ing output from objdump<br>

-d).<br>

<br>

Summary: average increase across CINT2006 was 1.85% using<br>

-fsanitize=integer with a neatly consistent ~5bytes per check<br>

(including text).<br>

<br>

For the ability to scale despite many sites triggering many times,<br>

this does seem like a valid trade-off.<br>

<br>

Previously IOC used a short linear scan table (~10-20 elements was<br>

sweet spot IIRC) with fallback to a larger hashtable to manage<br>

duplicates, but that was always a performance issue.  As a useful data<br>

point, a quick spot-check of 403.gcc shows 96 static locations<br>

triggered a total of 3,476,066 times dynamically when just processing<br>

one of the inputs used for the 'ref' input set (166.i).  More on this<br>

below.<br>

<div class="im"><br>

><br>

> +1. ThreadSanitizer has to solve the same problem - we want to report<br>

> each data race (pair of stack traces) exactly once. TSan runtime stores the<br>

> stacks of printed reports (as a sequence of PCs) to do this de-duplication.<br>

><br>

<br>

</div>Great, didn't realize TSan already solved this problem.  That said,<br>

the problem is somewhat different I think:<br>

<br>

* TSan supports differentiating based on stack trace (apparently), but<br>

that seems less interesting for ubsan/integer checks, especially since<br>

we don't print that information :).  The byte-per-check approach<br>

doesn't work for stack traces, so that's not really an option for tsan<br>

as-is.<br>

* I would (perhaps erroneously) expect tsan to have many fewer dynamic<br>

invocations than ubsan/integer checks, which might suggest difference<br>

trade-offs in the size vs performance department.  Checking a byte 1<br>

million times vs scanning and managing a vector of ~100 items > 1<br>

million times might make the size increase rather preferable, even if<br>

that's not the right decision for tsan (I have no idea what's right<br>

for tsan :)).<br></blockquote><div><br></div><div>TSan inserts a call to runtime library for each function entry/exit and</div><div>for each memory operation (load/store). Still, data races don't happen that</div><div>

often, so we have to access hashtable which stores stack traces of</div><div>printed reports on slow path only. We haven't observed any performance</div><div>problems here (Dmitry may correct me if I'm wrong).</div>

<div><br></div><div>I didn't extensively test ubsan on real-world applications so it's hard for me</div><div>to estimate the number of error reports it prints. But I think we need to count</div><div>not the number of calls to __ubsan_handle (i.e. number of places in code where</div>

<div>an error _might_ happen), but the number of actual unique reports printed by ubsan.</div><div>If, say, it's at most 10-20, then storing PCs of all the erroneous instructions and doing</div><div>a linear scan before printing another report might be better than bloating the binary size by 1%.</div>

<div><br></div><div>That said, I think that the de-duplication functionality should definitely be implemented one</div><div>way or another, and it should be "on" by default (and I can't imagine a reason why a user</div>

<div>may decide to turn it off).</div><div><br></div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Given this, and in light of the attached data, do you buy that this is<br>

indeed the appropriate approach for ubsan/integer checking?<br>

<br>

Until I actually gathered this data I wasn't sure, but the 1-2% seems<br>

very much worth it IMHO.<br>

<br>

Thoughts, and thanks for helping ensure we do what's right!<br>

<span class="HOEnZb"><font color="#888888"><br>

~Will<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

>><br>

>> On Thu, Dec 13, 2012 at 11:51 AM, Will Dietz <<a href="mailto:willdtz@gmail.com">willdtz@gmail.com</a>> wrote:<br>

>> > On Thu, Dec 13, 2012 at 12:05 PM, Dmitri Gribenko <<a href="mailto:gribozavr@gmail.com">gribozavr@gmail.com</a>><br>

>> > wrote:<br>

>> >> On Thu, Dec 13, 2012 at 7:33 PM, Will Dietz <<a href="mailto:willdtz@gmail.com">willdtz@gmail.com</a>> wrote:<br>

>> >>> This flag causes clang to emit a byte for each check that is used by<br>

>> >>> the<br>

>> >>> runtime to track whether we've already printed an error for that<br>

>> >>> check.<br>

>> >>><br>

>> >>>  Often failed checks are triggered many times dynamically, but a user<br>

>> >>>  is only interested in which checks failed (with example dynamic<br>

>> >>> values<br>

>> >>>  to aid in debugging).  This flag lets the user make such runs much<br>

>> >>>  more efficient and generate more manageable output.<br>

>> >><br>

>> >> Hi Will,<br>

>> >><br>

>> >> +  if (Checked) {<br>

>> >> +    if (*Checked) return;<br>

>> >> +    *Checked = true;<br>

>> >> +  }<br>

>> >><br>

>> >> Does it make sense to do the store atomically?  The user's program is<br>

>> >> already buggy, but introducing a possible data race is unfortunate.<br>

>> >><br>

>> >> Dmitri<br>

>> >><br>

>> ><br>

>> > Hi Dmitri,<br>

>> ><br>

>> > Glad you brought this up.  I wasn't sure which way to go on this and<br>

>> > erred on simplicity.  Attached is an updated compiler-rt patch using<br>

>> > __sync_val_compare_and_swap, which also simplifies the code a bit.  If<br>

>> > this builtin is sufficiently portable (architectures and compiler<br>

>> > recognition) then I would prefer this for the reasons you mention.<br>

>> ><br>

>> > Thanks!<br>

>> ><br>

>> > _______________________________________________<br>

>> > cfe-dev mailing list<br>

>> > <a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>

>> > <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>

>> ><br>

>> _______________________________________________<br>

>> cfe-dev mailing list<br>

>> <a href="mailto:cfe-dev@cs.uiuc.edu">cfe-dev@cs.uiuc.edu</a><br>

>> <a href="http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev" target="_blank">http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev</a><br>

><br>

><br>

><br>

><br>

> --<br>

> Alexey Samsonov, MSK<br>

><br>

</div></div></blockquote></div><br><br clear="all"><div><br></div>-- <br><div>Alexey Samsonov, MSK</div><br>

</div>