[PATCH] tsan: do not instrument not captured values

Thu Jan 22 00:25:53 PST 2015

On Thu, Jan 22, 2015 at 11:14 AM, Chandler Carruth <chandlerc at google.com> wrote:
>
> On Wed, Jan 21, 2015 at 11:38 PM, Dmitry Vyukov <dvyukov at google.com> wrote:
>>
>> You are right.
>> But this optimization is too fruitful to just discard it. So fruitful
>> (-40% of instrumentation in a large webrtc test) that I am inclining
>> towards ignoring the possibility of passing the object using relaxed
>> atomics... On the other hand people do mess memory ordering, so losing
>> these races is pity as well...
>
>
> It would make me very sad to lose this feature of TSan. Of all the subtle
> racy-queue techniques I have seen or heard of over the years, the one I
> cited is actually one of the few that I have seen debugged specifically
> through the use of TSan.
>
> I also fear losing it in small part because it is a specific portability
> risk between x86 and weak memory architectures, one of the biggest features
> of TSan for me.
>
> But it's wild that this is 40% of the instrumentation in a large webrtc
> test. That seems to clearly indicate that there is *something* to be done
> here, but I don't know yet what that is... so:
>
>>
>> Maybe we can figure out a way to get both at least in most cases.
>
>
> That would be my hope as well. =]
>
>>
>> Few
>> observations:
>> 1. Leaking of stack objects to other threads is very infrequent (I
>> would say 1%).
>
>
> Infrequent relative to *captured* stack objects? Yes, but I'm not sure how
> infrequent really. Mostly this is because I expect most stack objects to
> never be captured.

If I do:

std::string s(...);
s.find(...);

and find is not inlined, but ctor is inlined. Is not it the case that
s is captured, but stores in ctor can elided by this optimization?

>> 2. Whole stack is generally touched by the current thread, so there
>> are chances that we will still detect the race.
>
>
> I'm not really sure why? The cases I have seen this (and I went looking and
> found a few others) all look like fresh stack allocations that are written
> to and then passed off to another thread. I wouldn't expect (in common
> cases) for anything else to write to the stack without there being some
> "synchronization" that hides any race.

Touched during _previous_ execution of the thread, potentially by
already exited functions.  I mean that shadow for stack most likely
already contains stores from the current thread.

>> 3. I suspect that just deduplicating memory accesses before capturing
>> will remove very few accesses, because the accesses before capturing
>> are the initial initialization of the object (write each member var
>> only once).
>
>
> I wasn't sure what the cost function of the tsan instrumentation was, and
> whether it is significantly cheaper to do a single instrumentation call for
> N*M bytes or N calls for M bytes each.

Yes, it is cheaper. The cost is mostly C*number_of_executed_memory_accesses.

>> 4. Perhaps we can tsan-write whole stack frame in function prologue
>> using __tsan_write8, and then ignore individual accesses before
>> capturing. This is somewhat similar to 3, but may be more efficient
>> (uses 8-byte writes and write once instead of on each iteration of a
>> loop). Is it possible to do in llvm? Do you think it is sufficient?
>
>
> Interesting. This definitely seems possible. I think this is very similar to
> my idea for #3 -- essentially to coalesce all instrumentation of all static
> (non-dynamic in the LLVM IR lingo)
>
> But I think this is a more general idea. There is a *lot* that we could do
> to coalesce instrumentation and reduce it, moving it outside of loops, etc.
>
> I think it would be useful to actually look at the specific instrumentation
> patterns that are coming up most frequently in real world code patterns...

Yes, unfortunately... at this point in turns into significantly more
time consuming task with a more complex solution...