[PATCH] D47107: Vector clock algorithm improvement in case of blocking release sequence

Dmitry Vyukov via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Thu Jun 7 07:50:33 PDT 2018


dvyukov added a comment.

Hi Yorov,

Sorry for delays, I was on a vacation and now have a backlog of work.

Thanks for working on this.

This needs some additional work before we can merge this. High-level notes:

1. Fences and release sequences need to be separated into individual patches. I would suggest to start with fences as it looks simpler.

2. Tsan is used on a very large programs with thousands of threads and tens of millions of sync objects. Performance and especially memory consumption of synchronization handling is crucial. Since tsan is already widely deployed any non-constant overheads need to be controlled by flags. For example, any fence handling need to be completely skipped if the flag is not set. Default value for these flags it's a hard question, but we can decide later.

3. Vector's should not be used inside of sync objects. There are 2 problems with Vector's:
  - they can use memory suboptimally, and this is unacceptable for sync objects since they can consume tens of gigs
  - our memory allocator does reuse memory between different size classes, this leads to O(N^2) memory consumption as these Vector's slowly grow during program startup and eat memory in all consecutive size classes; that's the reason why current clocks are built as chunked data structures with constant-size blocks

4. For fences check out what we did in KTSAN to reduce their overheads (acquire_active and release_active):

https://github.com/google/ktsan/blob/tsan/mm/ktsan/sync_atomic.c#L6
I think we need something similar here. The actual threshold for discarding fence effects should be controllable with a flag.

5. For release fences it's possible to get their overhead to almost zero in common case. The idea is as follows. In common case there are no acquire operations in between release fence and the corresponding relaxed store, in such case we don't need to memorize the whole thread vector clock. Instead we can memorize only the current thread time. Then when we need to turn a relaxed store into release operation, we use the current thread vector clock (it did not change because we did not do any acquire operations) except for the current thread time, for the current thread time we use the previously saved value. This makes release fence cost O(1) instead of O(N).

But if there is an acquire operation in between, then we need to save the whole vector clock at that point. However, if we also do (4), then usually there won't be an acquire fence while a release fence is active.



================
Comment at: lib/tsan/rtl/tsan_clock.cc:185
+
+  // Check if it's empty -> no need to do anything.
+  if (src->size_ == 0) {
----------------
This comment and "Check if we need to resize dst" does not seem to add much value. I would rather see more extensive comments for more complex parts of the change.


================
Comment at: lib/tsan/rtl/tsan_clock.cc:186
+  // Check if it's empty -> no need to do anything.
+  if (src->size_ == 0) {
+    return;
----------------
This code base does not use {} around if/for blocks if they contain a single statement. Please fix here and below.


================
Comment at: lib/tsan/rtl/tsan_clock.cc:274
+
+  // TODO(sobir.yorov94 at gmail.com):
+  // optimize in case when we didn't change fence_release_clock from last time
----------------
The TODOs need to be resolved before commit. Either by fixing code, or by removing TODO if we are not going to fix code.


================
Comment at: lib/tsan/rtl/tsan_clock.cc:349
   // This is fine since clk_ beyond size is all zeros.
+
   uptr i = 0;
----------------
revert


================
Comment at: lib/tsan/rtl/tsan_clock.cc:383
+  DCHECK_LE(dst->release_sequence_blocked.Size(), kMaxTid);
+  // TODO(yorov.sobir): add stat here
+
----------------
Same here: either resolve it, or remove.


================
Comment at: lib/tsan/rtl/tsan_clock.cc:396
+  block_release_sequences(dst);
+  // unshare before changing dst
+  dst->Unshare(c);
----------------
We use 2 styles for comments:
1. comment that take the full line are full English sentences: start with a capital letter and end with dot
2. short comments on the same line with code can be either full English sentences or start with lower letter and end with no dot.

So if you do this as a full line this needs to be:
```
// Unshare before changing dst.
```


================
Comment at: lib/tsan/rtl/tsan_clock.cc:443
+  }
+}
 // Updates only single element related to the current thread in dst->clk_.
----------------
Empty line between functions please.


================
Comment at: lib/tsan/rtl/tsan_clock.h:180
   u64 clk_[kMaxTidInClock];  // Fixed size vector clock.
+  u64 fence_release_clock[kMaxTidInClock];
+  u64 fence_acquire_clock[kMaxTidInClock];
----------------
I think we can use kMaxTid here.
kMaxTidInClock is kMaxTid*2, that's required to detect accesses to freed memory at low cost. But we should not access the second half of the clock here.


================
Comment at: lib/tsan/rtl/tsan_interface_atomic.cc:227
   // Assume the access is atomic.
-  if (!IsAcquireOrder(mo)) {
-    MemoryReadAtomic(thr, pc, (uptr)a, SizeLog<T>());
-    return NoTsanAtomicLoad(a, mo);
-  }
+//  if (!IsAcquireOrder(mo)) {
+//    MemoryReadAtomic(thr, pc, (uptr)a, SizeLog<T>());
----------------
No dead code please.


================
Comment at: test/CMakeLists.txt:18
 if(COMPILER_RT_STANDALONE_BUILD)
-  add_executable(FileCheck IMPORTED GLOBAL)
+  add_executable(FileCheck IMPORTED GLOBAL tsan/atomic_release_sequence_blocking.cpp tsan/fence_norace.cpp)
   set_property(TARGET FileCheck PROPERTY IMPORTED_LOCATION ${LLVM_TOOLS_BINARY_DIR}/FileCheck)
----------------
I don't think this needs to be here as no other tests are listed here.
If you want to build a single test, just invoke the compiler manually.


Repository:
  rCRT Compiler Runtime

https://reviews.llvm.org/D47107





More information about the llvm-commits mailing list