[all-commits] [llvm/llvm-project] 4408ee: tsan: fix false positives in AcquireGlobal

Dmitry Vyukov via All-commits all-commits at lists.llvm.org
Wed May 27 07:30:17 PDT 2020


  Branch: refs/heads/master
  Home:   https://github.com/llvm/llvm-project
  Commit: 4408eeed0ff191304121c11168aa1db861cccb97
      https://github.com/llvm/llvm-project/commit/4408eeed0ff191304121c11168aa1db861cccb97
  Author: Dmitry Vyukov <dvyukov at google.com>
  Date:   2020-05-27 (Wed, 27 May 2020)

  Changed paths:
    M compiler-rt/lib/tsan/rtl/tsan_clock.cpp
    M compiler-rt/lib/tsan/rtl/tsan_clock.h
    M compiler-rt/lib/tsan/rtl/tsan_rtl_mutex.cpp
    A compiler-rt/test/tsan/java_finalizer2.cpp

  Log Message:
  -----------
  tsan: fix false positives in AcquireGlobal

Add ThreadClock:: global_acquire_ which is the last time another thread
has done a global acquire of this thread's clock.

It helps to avoid problem described in:
https://github.com/golang/go/issues/39186
See test/tsan/java_finalizer2.cpp for a regression test.
Note the failuire is _extremely_ hard to hit, so if you are trying
to reproduce it, you may want to run something like:
$ go get golang.org/x/tools/cmd/stress
$ stress -p=64 ./a.out

The crux of the problem is roughly as follows.
A number of O(1) optimizations in the clocks algorithm assume proper
transitive cumulative propagation of clock values. The AcquireGlobal
operation may produce an inconsistent non-linearazable view of
thread clocks. Namely, it may acquire a later value from a thread
with a higher ID, but fail to acquire an earlier value from a thread
with a lower ID. If a thread that executed AcquireGlobal then releases
to a sync clock, it will spoil the sync clock with the inconsistent
values. If another thread later releases to the sync clock, the optimized
algorithm may break.

The exact sequence of events that leads to the failure.
- thread 1 executes AcquireGlobal
- thread 1 acquires value 1 for thread 2
- thread 2 increments clock to 2
- thread 2 releases to sync object 1
- thread 3 at time 1
- thread 3 acquires from sync object 1
- thread 1 acquires value 1 for thread 3
- thread 1 releases to sync object 2
- sync object 2 clock has 1 for thread 2 and 1 for thread 3
- thread 3 releases to sync object 2
- thread 3 sees value 1 in the clock for itself
  and decides that it has already released to the clock
  and did not acquire anything from other threads after that
  (the last_acquire_ check in release operation)
- thread 3 does not update the value for thread 2 in the clock from 1 to 2
- thread 4 acquires from sync object 2
- thread 4 detects a false race with thread 2
  as it should have been synchronized with thread 2 up to time 2,
  but because of the broken clock it is now synchronized only up to time 1

The global_acquire_ value helps to prevent this scenario.
Namely, thread 3 will not trust any own clock values up to global_acquire_
for the purposes of the last_acquire_ optimization.

Reviewed-in: https://reviews.llvm.org/D80474
Reported-by: nvanbenschoten (Nathan VanBenschoten)




More information about the All-commits mailing list