[PATCH] tsan: do not instrument not captured values

Thu Jan 22 00:14:32 PST 2015

On Wed, Jan 21, 2015 at 11:38 PM, Dmitry Vyukov <dvyukov at google.com> wrote:

> You are right.
> But this optimization is too fruitful to just discard it. So fruitful
> (-40% of instrumentation in a large webrtc test) that I am inclining
> towards ignoring the possibility of passing the object using relaxed
> atomics... On the other hand people do mess memory ordering, so losing
> these races is pity as well...
>

It would make me very sad to lose this feature of TSan. Of all the subtle
racy-queue techniques I have seen or heard of over the years, the one I
cited is actually one of the few that I have seen debugged specifically
through the use of TSan.

I also fear losing it in small part because it is a specific portability
risk between x86 and weak memory architectures, one of the biggest features
of TSan for me.

But it's wild that this is 40% of the instrumentation in a large webrtc
test. That seems to clearly indicate that there is *something* to be done
here, but I don't know yet what that is... so:

> Maybe we can figure out a way to get both at least in most cases.
>

That would be my hope as well. =]

> Few
> observations:
> 1. Leaking of stack objects to other threads is very infrequent (I
> would say 1%).
>

Infrequent relative to *captured* stack objects? Yes, but I'm not sure how
infrequent really. Mostly this is because I expect most stack objects to
never be captured.

> 2. Whole stack is generally touched by the current thread, so there
> are chances that we will still detect the race.
>

I'm not really sure why? The cases I have seen this (and I went looking and
found a few others) all look like fresh stack allocations that are written
to and then passed off to another thread. I wouldn't expect (in common
cases) for anything else to write to the stack without there being some
"synchronization" that hides any race.

3. I suspect that just deduplicating memory accesses before capturing
> will remove very few accesses, because the accesses before capturing
> are the initial initialization of the object (write each member var
> only once).
>

I wasn't sure what the cost function of the tsan instrumentation was, and
whether it is significantly cheaper to do a single instrumentation call for
N*M bytes or N calls for M bytes each.

4. Perhaps we can tsan-write whole stack frame in function prologue
> using __tsan_write8, and then ignore individual accesses before
> capturing. This is somewhat similar to 3, but may be more efficient
> (uses 8-byte writes and write once instead of on each iteration of a
> loop). Is it possible to do in llvm? Do you think it is sufficient?
>

Interesting. This definitely seems possible. I think this is very similar
to my idea for #3 -- essentially to coalesce all instrumentation of all
static (non-dynamic in the LLVM IR lingo)

But I think this is a more general idea. There is a *lot* that we could do
to coalesce instrumentation and reduce it, moving it outside of loops, etc.

I think it would be useful to actually look at the specific instrumentation
patterns that are coming up most frequently in real world code patterns...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150122/198e112c/attachment.html>